Natural language processing
US-11947912-B1 · Apr 2, 2024 · US
US12511487B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12511487-B2 |
| Application number | US-202318226303-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 26, 2023 |
| Priority date | Jul 26, 2023 |
| Publication date | Dec 30, 2025 |
| Grant date | Dec 30, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Large language models (LLMs) are versatile in responding to user questions on a wide variety of topics. However, LLMs suffer from several drawbacks, such as hallucinations, incomplete information, and inability to cite original sources of information. Disclosed herein are systems and methods for using an LLM in a restricted manner to respond to queries regarding document corpora, e.g., documents related to a set of products, such that the impact of these drawbacks is minimized. Information retrieval is coupled with LLMs to build a question and answer (Q&A) system on the text corpora. Complex retrieved information, incorporating human feedback, and recommendations in the Q&A system are provided.
Opening claim text (preview).
What is claimed is: 1 . A method, comprising: receiving a query from a user device; mapping the query to a latent semantic embedding space; modeling a number of document segments of a number of documents, wherein each of the number of document segments comprises document content, and wherein a distance between any two of the number of document segments in the latent semantic embedding space is proportional to a degree of similarity between the document content thereof; generating a prompt comprising the query and a set of nearest document segments in the latent semantic embedding space matching the query; submitting the prompt to a large language model (LLM) and receiving a response therefrom, wherein the response comprises an answer to the query and indicia of a document segment of the number of document segments comprising document content matching the answer; providing the response to the user device; wherein the document content of at least one document of the number of documents comprises table data, the table data comprising a number of cells having a cell value, the cell value being further associated with at least one of a column heading, a row heading, or a table heading of the table data; and segmenting the table data into at least one of the number of document segments comprises: linearizing the table data to comprise a linearized cell value comprising the cell value, the column, and the heading; and segmenting at topic breakpoints between topics of the linearized table data. 2 . The method of claim 1 , further comprising formatting the response wherein the indicia of the document segment of the number of document segments comprising the document content matching the answer, comprises indicia of one of the number of documents comprising the document content matching the answer. 3 . The method of claim 1 , further comprising: accessing a document corpus having the number of documents; segmenting each document of the number of documents into the number of document segments, each document segment comprising the document content; and plotting each of the number of document segments into the latent semantic embedding space. 4 . The method of claim 3 , wherein segmenting each document of the number of documents into the number of document segments comprises, for at least one document of the number of documents, segmenting at formatting breakpoints of the document content. 5 . The method of claim 3 , wherein segmenting each document of the number of documents into the number of document segments comprises: segmenting at topic breakpoints within the document content. 6 . The method of claim 5 , wherein determining the topic breakpoints within the document content comprises: segmenting the document content into a first number of segments; computing a number of tokens for each of the first number of segments, wherein each token of the number of tokens represents a single word of the document content; accessing a context size of the LLM; estimating a high threshold number of tokens from the context size of the LLM; tokenizing the document content of a segment into a number of tokens; and upon determining the number of tokens is greater than the high threshold number of tokens, resegmenting the document content into a second number of segments that is greater than the first number of segments. 7 . The method of claim 5 , wherein determining the topic breakpoints within the document content comprises: segmenting the document content into a first number of segments; computing a number of tokens for each of the first number of segments, wherein each token of the number of tokens represents a single word of the document content; accessing a context size of the LLM; estimating a low threshold number of tokens from the context size of the LLM; tokenizing the document content of a segment into a number of tokens; and upon determining the number of tokens is less than the low threshold number of tokens, resegmenting the document content into a second number of segments that is less than the first number of segments. 8 . The method of claim 3 , wherein: the document content of at least one document of the number of documents comprises visual data; and the method further comprises extracting at least one textual description from metadata of the at least one document and segmenting the number of document segments of the at least one textual description. 9 . The method of claim 3 , wherein: the document content of at least one document of the number of documents comprises visual data, the visual data further comprising a number of video frames; and the method further comprises extracting at least one textual description from metadata of the at least one of the number of video frames and segmenting the number of document segments of the at least one textual description. 10 . The method of claim 9 , wherein the metadata comprises digital images of text. 11 . A system, comprising: a server, comprising at least one microprocessor coupled to a computer memory storing machine-readable instructions therein; the instructions causing the server to perform: receiving a query from a user device; mapping the query to a latent semantic embedding space, modeling a number of document segments of a number of documents, wherein each of the number of document segments comprises document content, and wherein a distance between any two of the number of document segments in the latent semantic embedding space is proportional to a degree of similarity between the document content thereof; generating a prompt comprising the query and a set of nearest document segments in the latent semantic embedding space matching the query; submitting the prompt to a large language model (LLM) and receiving a response therefrom, wherein the response comprises an answer to the query and indicia of a document segment of the number of document segments comprising document content matching the answer; providing the response to the user device; wherein the document content of at least one document of the number of documents comprises table data, the table data comprising a number of cells having a cell value, the cell value being further associated with at least one of a column heading, a row heading, or a table heading of the table data; and segmenting the table data into at least one of the number of document segments, comprises: linearizing the table data to comprise a linearized cell value comprising the cell value, the column, and the heading; and segmenting at topic breakpoints between topics of the linearized table data. 12 . The system of claim 11 , further comprising formatting the response, wherein the indicia of the document segment of the number of document segments comprising the document content matching the answer, comprises indicia of one of the number of documents comprising the document content matching the answer. 13 . The system of claim 11 , further comprising: accessing a document corpus having the number of documents; segmenting each document of the number of documents into the number of document segments, each document segment comprising the document content; and plotting each of the number of document segments into the latent semantic embedding space. 14 . The system of claim 13 , wherein segmenting each document of the number of documents into the number of document segments comprises, for at least one document of the number of documents, segmenting at formatting breakpoints of the document content. 15 . The system of claim 13 , wherein segmenting
Semantic analysis · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.