
"Caching ... semantically similar queries."
I do imagine the phrase semantically similar is doing some unusually heavy lifting in this context.
How the semantics ("meaning") are assigned to a query (which is essentially a well formed string of symbols conforming with some syntax) is an interesting question even leaving pragmatics out of the question.
I suspect the semantics of a tokenized query is defined as the output from the LLM that query elicits - a cyclic or self referential definition which I imagine presumes a least fixed point?
The degree of semantic closeness of two queries for the purposes of caching precludes using the full output of the target LLM so I assume some simpler (minded) metric is used possibly depending on a very much smaller LLM trained for that purpose.
I sometimes wonder whether the current AI/LLM mania doesn't fall foul of the first two commandments† not that the world wasn't already awash with with venal false gods.
† Thou shalt have no other gods before me. Thou shalt not make unto thee any graven image... Thou shalt not bow down thyself to them, nor serve them. (KJV Exodus 20:3-5)