Semantic graphing of heterogeneous documents for automated decision making and resource allocation using reinforcement learning
US-2023244990-A1 · Aug 3, 2023 · US
US12361209B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12361209-B2 |
| Application number | US-202418668959-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 20, 2024 |
| Priority date | Jul 27, 2022 |
| Publication date | Jul 15, 2025 |
| Grant date | Jul 15, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods of the present disclosure enable database search. The systems and/or methods may include receiving a search query that includes an input document having text. Word embeddings are generated within the input document, where the word embeddings include vector representations of words in the text of the input document. An average input document word embedding vector is determined for the word embeddings of the input document. A set of stored documents is accessed, where each stored document includes a stored text has a particular average stored document word embedding vector. A similarity model is used to determine a similarity metric measuring the similarity between the input document and each stored document based on the average input document word embedding vector and the particular average stored document word embedding vector of each stored document.
Opening claim text (preview).
What is claimed is: 1. A method comprising: accessing, by at least one processor, a training set of stored documents; wherein the training set of stored documents comprise: at least one existing pair of stored documents representing at least one pair of stored documents that are similar to each other, and at least one non-existing pair of stored documents representing at least one pair of stored documents that are not similar to each other; generating, by the at least one processor, a plurality of initial stored document word embeddings within each stored document of the set of stored documents; wherein the plurality of initial stored document word embeddings comprise a plurality of stored document vector representations of a plurality of words in text of each stored document; determining, by the at least one processor, an average stored document word embedding vector for the plurality of initial stored document word embeddings for each stored document; utilizing, by the at least one processor, a similarity model to determine a similarity metric of a similarity between a first stored document and a second stored document of each candidate pair of a plurality of candidate pairs of stored documents in the set of stored documents based at least in part on the average stored document word embedding vector of each of the first stored document and the second stored document; generating, by the at least one processor, a refined average stored document word embedding by backpropagating an error of the similarity metric of each candidate pair, wherein the error is based at least in part on the at least one existing pair and the at least one non-existing pair; and returning, by the at least one processor, in response to a search query comprising a search document, at least one stored document of the set of stored documents based at least in part on a comparison of the search document to the plurality of refined stored document word embeddings for each stored document of the set of stored documents. 2. The method of claim 1 , wherein the similarity model comprises a cosine similarity determination. 3. The method of claim 1 , further comprising: utilizing, by the at least one processor, a word vectorization model to generate the plurality of initial stored document word embeddings for the plurality of stored documents; receiving, by the at least one processor, a user selection confirming or denying the similarity metric of at least one stored document in the plurality of stored documents; determining, by the at least one processor, a similarity error based at least in part on a difference according to an optimization function between: i) the user selection confirming or denying the similarity metric of the at least one stored document in the plurality of stored documents, and ii) a ranked position of the at least one stored document within the plurality of stored documents; and training, by the at least one processor, parameters of the word vectorization model based at least in part on the similarity error. 4. The method of claim 1 , further comprising: receiving, by the at least one processor, a user selection confirming or denying the similarity metric of at least one stored document in the plurality of stored documents; determining, by the at least one processor, a similarity error based at least in part on a difference according to an optimization function between: i) the user selection confirming or denying the similarity metric of the at least one stored document in the plurality of stored documents, and ii) a ranked position of the at least one stored document within the plurality of stored documents; and training, by the at least one processor, parameters of the similarity model based at least in part on the similarity error. 5. The method of claim 1 , wherein the similarity model comprises an optimization objective to maximize the similarity metric between the plurality of stored documents and the training set of stored documents. 6. The method of claim 5 , wherein the similarity model comprises at least one clustering model. 7. The method of claim 1 , further comprising: generating, by the at least one processor, a k-d tree of the set of stored documents; and determining, by the at least one processor, the plurality of stored documents by using the similarity model to traverse the k-d tree. 8. The method of claim 1 , further comprising: receiving, by at least one processor, a new document having new text; generating, by the at least one processor, a plurality of new word embeddings for the new document; determining, by the at least one processor, a new average word embedding vector of the plurality of new word embeddings for the new document; and storing, by the at least one processor, the new document in the set of stored documents; wherein storing the new document in the set of stored documents comprises adding the new average word embedding vector to a cache of the stored average word embedding associated with the stored text of each stored document. 9. The method of claim 1 , wherein the average of the plurality of stored document word embeddings comprises a weighted average based at least in part on a section of the text in which each word is located. 10. The method of claim 1 , further comprising: generating, by the at least one processor, a similarity alert based at least in part on the similarity metric of the stored document to at least one stored document in the set of stored documents exceeding a predetermined similarity threshold; and causing, by the at least one processor, a computing device to produce the similarity alert to a user to alert the user of the at least one stored document. 11. A system comprising: at least one processor; and at least one storage medium communicating with the at least one processor and having encoded thereon executable instructions that, when executed by the at least one processor, cause the at least one processor to carry out a method comprising: accessing a training set of stored documents; wherein the training set of stored documents comprise: at least one existing pair of stored documents representing at least one pair of stored documents that are similar to each other, and at least one non-existing pair of stored documents representing at least one pair of stored documents that are not similar to each other; generating a plurality of initial stored document word embeddings within each stored document of the set of stored documents; wherein the plurality of initial stored document word embeddings comprise a plurality of stored document vector representations of a plurality of words in text of each stored document; determining an average stored document word embedding vector for the plurality of initial stored document word embeddings for each stored document; utilizing a similarity model to determine a similarity metric of a similarity between a first stored document and a second stored document of each candidate pair of a plurality of candidate pairs of stored documents in the set of stored documents based at least in part on the average stored document word embedding vector of each of the first stored document and the second stored document; generating a refined average stored document word embedding by backpropagating an error of the similarity metric of each candidate pair, wherein the error is based at least in part on the at least one existing pair and the at least one non-existing pair; and returning in response to a search query comprising a search document, at least one stored document of the set of stored documents based at least in part on a comparison of the search documen
Presentation of query results · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
using vector based model · CPC title
Calculation of difference between files · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.