What technology area does this patent fall under?

Primary CPC classification G06F16/3326. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 25 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Caching large language model (LLM) responses using hybrid retrieval and reciprocal rank fusion

US12259913B1 · US · B1

Patent metadata
Field	Value
Publication number	US-12259913-B1
Application number	US-202418441863-A
Country	US
Kind code	B1
Filing date	Feb 14, 2024
Priority date	Feb 14, 2024
Publication date	Mar 25, 2025
Grant date	Mar 25, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method for improving computer functionality by retrieving answers/responses to questions/input from a cache such as those used with chatbots and generative AI systems. Disclosed is a multi-layered caching strategy that focuses on the relevance of a cache hit by improving the quality of the answer. The approach demonstrates that response latency is significantly reduced when using caching and how a caching strategy could be applied in various layers of increasing relevance for a simple Question-and-Answer system with the possibility of extending to more complex generative AI interactions.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for improving computer functionality by retrieving answers to questions from a cache, the method comprising: using a hardware processor communicatively coupled to memory to perform accessing a question stored in primary storage communicatively coupled to a cache, in a text format; accessing metadata associated with the question in the text format; vectorizing the question in the text format into a high dimensional vector using a text embedding algorithm, wherein the high dimensional vector is greater than or equal to 1024 dimensions; using the high dimensional vector to search a question portion of the cache using a plurality of retriever types to create a hybrid search, in which the hybrid search combines one or more text format queries using the metadata with one or more high dimensional vector queries in a single search request; performing query filtering with the metadata associated with the question to provide a semantic layer set of semantic answers in a text format with metadata associated with an answer and semantic relevance values; using the question in the text format to search an answer portion of the cache and performing query filtering with the metadata associated with the question to provide a lexical layer set of lexical answers in the text format with the metadata associated with the answer and lexical relevance values; using the semantic layer set in order of the semantic relevance values from highest to lowest and the lexical layer set from highest to lowest; and applying a reciprocal rank fusion algorithm to compute a combined ranking set for the semantic answers in the text format and the lexical answers in the text format to provide an identified answer. 2. The method of claim 1 , further comprising: in response to the semantic relevance values being above a settable value, returning the semantic answers in the text format with the highest semantic relevance values and, otherwise, sending the question in the text format to create a prompt. 3. The method of claim 1 , further comprising: in response to the combined ranking set being above a settable value, returning the identified answer and, otherwise, sending the question in the text format to create a prompt. 4. The method of claim 1 , wherein the performing query filtering with the metadata associated with the question provides an exact match result, wherein the high dimensional vector to search the question portion of the cache provides an approximate match ranked by the semantic relevance values. 5. The method of claim 1 , wherein the performing query filtering with Q-metadata provides an exact match result, wherein the question in the text format to search a question portion of the cache provides an approximate match ranked by the lexical relevance values. 6. The method of claim 1 , further comprising: in response to a subsequent question is received, the cache is first checked to see if a similar request has already been made and, in response, retrieving the answer from the cache. 7. The method of claim 1 , wherein the accessing a question in a text format includes accessing a question that originated from a human user or from a computer process. 8. A method for improving relevancy of a cache hit, the method comprising: operating a cache communicatively coupled to primary storage in an information retrieval system; accessing a question in a text format; accessing metadata associated with the question in text format; vectorizing the question in the text format into a high dimensional vector using a text embedding algorithm, wherein the high dimensional vector is greater than or equal to 1024 dimensions; using the high dimensional vector to search a question portion of the cache using a plurality of retriever types to create a hybrid search, in which the hybrid search combines one or more text format queries using the metadata with one or more high dimensional vector queries in a single search request; performing query filtering with the metadata associated with the question to provide a semantic layer set of semantic answers in a text format with metadata associated with an answer and semantic relevance values; using question in the text format to search an answer portion of the cache and performing query filtering with the metadata associated with the question to provide a lexical layer set of lexical answers in the text format with the metadata associated with the answer and lexical relevance values; using the semantic layer set in order of the semantic relevance values from highest to lowest and the lexical layer set from highest to lowest; applying a reciprocal rank fusion algorithm to compute a combined ranking set for the semantic answers in the text format and the lexical answers in the text format to provide an identified answer; and in response to the combined ranking set being above a settable value, returning the identified answer and, otherwise, sending the question in the text format to create a prompt. 9. The method of claim 8 , wherein the performing query filtering with the metadata associated with the question provides an exact match result, wherein the high dimensional vector to search the question portion of the cache provides an approximate match ranked by the semantic relevance values. 10. The method of claim 9 , wherein the performing query filtering with the metadata associated with the question provides an exact match result, wherein the high dimensional vector to search a question portion of the cache provides an approximate match ranked by the lexical relevance values. 11. A system for improving computer functionality by retrieving answers to questions from a cache, the system comprising the cache communicatively coupled to primary storage in an information retrieval system; memory; at least one processor communicatively coupled to memory and the information retrieval system, programmed to perform; accessing metadata associated with the question in text format; vectorizing the question in the text format into a high dimensional vector using a text embedding algorithm, wherein the high dimensional vector is greater than or equal to 1024 dimensions; using the high dimensional vector to search a question portion of the cache using a plurality of retriever types to create a hybrid search, in which the hybrid search combines one or more text format queries using the metadata with one or more high dimensional vector queries in a single search request; performing query filtering with the metadata associated with the question to provide a semantic layer set of semantic answers in a text format with metadata associated with an answer and semantic relevance values; using the question in the text format to search an answer portion of the cache and performing query filtering with the metadata associated with the question to provide a lexical layer set of lexical answers in the text format with the metadata associated with the answer and lexical relevance values; using the semantic layer set in order of the semantic relevance values from highest to lowest and the lexical layer set from highest to lowest; and applying a reciprocal rank fusion algorithm to compute a combined ranking set for the semantic answers in the text format and the lexical answers in the text format to provide an identified answer. 12. The system of claim 11 , further comprising: in response to the semantic relevance values being above a settable value, returning the semantic answers in the text format with the highest semantic relevance values and, otherwise, sending the question in the text format to create a prompt. 13. The sys

Assignees

Inventus Holdings Llc

Inventors

Classifications

G06F16/3326Primary
using relevance feedback from the user, e.g. relevance feedback on documents, documents sets, document terms or passages · CPC title
G06F16/335
Filtering based on additional data, e.g. user or group profiles (filtering in web context G06F16/9535, G06F16/9536) · CPC title
G06F16/3347
using vector based model · CPC title
G06F16/38
Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually · CPC title
G06F16/3329Primary
Natural language query formulation · CPC title

Patent family

Related publications grouped by family.

View patent family 95069730

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12259913B1 cover?: A system and method for improving computer functionality by retrieving answers/responses to questions/input from a cache such as those used with chatbots and generative AI systems. Disclosed is a multi-layered caching strategy that focuses on the relevance of a cache hit by improving the quality of the answer. The approach demonstrates that response latency is significantly reduced when using c…
Who is the assignee on this patent?: Inventus Holdings Llc
What technology area does this patent fall under?: Primary CPC classification G06F16/3326. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 25 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).