What technology area does this patent fall under?

Primary CPC classification G06F40/194. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 21 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems for database searching and database schemas management and methods of use thereof

US11989506B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11989506-B2
Application number	US-202217874855-A
Country	US
Kind code	B2
Filing date	Jul 27, 2022
Priority date	Jul 27, 2022
Publication date	May 21, 2024
Grant date	May 21, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods of the present disclosure enable database search. The systems and/or methods may include receiving a search query that includes an input document having text. Word embeddings are generated within the input document, where the word embeddings include vector representations of words in the text of the input document. An average input document word embedding vector is determined for the word embeddings of the input document. A set of stored documents is accessed, where each stored document includes a stored text has a particular average stored document word embedding vector. A similarity model is used to determine a similarity metric measuring the similarity between the input document and each stored document based on the average input document word embedding vector and the particular average stored document word embedding vector of each stored document.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: accessing, by at least one processor, a training set of stored documents; wherein the training set of stored documents comprise: at least one existing pair of stored documents representing at least one pair of stored documents that are similar to each other, and at least one non-existing pair of stored documents representing at least one pair of stored documents that are not similar to each other; generating, by the at least one processor, a plurality of initial stored document word embeddings within each stored document of the set of stored documents; wherein the plurality of initial stored document word embeddings comprise a plurality of stored document vector representations of a plurality of words in text of each stored document; determining, by the at least one processor, an average stored document word embedding vector for the plurality of initial stored document word embeddings for each stored document; utilizing, by the at least one processor, a similarity model to determine a similarity metric of a similarity between a first stored document and a second stored document of each candidate pair of a plurality of candidate pairs of stored documents in the set of stored documents based at least in part on the average stored document word embedding vector of each of the first stored document and the second stored document; generating, by the at least one processor, a plurality of refined stored document word embeddings for each stored document in the set of stored documents by backpropagating an error of the similarity metric of each candidate pair, wherein the error is based at least in part on the at least one existing pair and the at least one non-existing pair; generating, by the at least one processor, a refined average stored document word embedding vector for the plurality of refined stored document word embeddings for each stored document; receiving, by the at least one processor, a search query from a computing device associated with a user; wherein the search query comprises an input document having text; generating, by the at least one processor, a plurality of input document word embeddings within the input document; wherein the plurality of input document word embeddings comprise a plurality of vector representations of a plurality of words in the text of the input document; determining, by the at least one processor, an average input document word embedding vector for the plurality of input document word embeddings for the input document; utilizing, by the at least one processor, the similarity model to determine an input document similarity metric of an input document similarity between the input document and each stored document in the set of stored documents based at least in part on the average input document word embedding vector and the refined average stored document word embedding vector of each stored document; and instructing, by the at least one processor, the computing device to display a ranked list of stored documents in response to the search query. 2. The method of claim 1 , wherein the similarity model comprises a cosine similarity determination. 3. The method of claim 1 , further comprising: utilizing, by the at least one processor, a word vectorization model to generate the plurality of input document word embeddings for the input document; receiving, by the at least one processor, a user selection confirming or denying the similarity metric of at least one stored document in the ranked list of stored documents; determining, by the at least one processor, a similarity error based at least in part on a difference according to an optimization function between: i) the user selection confirming or denying the similarity metric of the at least one stored document in the ranked list of stored documents, and ii) a ranked position of the at least one stored document within the ranked list of the stored documents; and training, by the at least one processor, parameters of the word vectorization model based at least in part on the similarity error. 4. The method of claim 1 , further comprising: receiving, by the at least one processor, a user selection confirming or denying the similarity metric of at least one stored document in the ranked list of stored documents; determining, by the at least one processor, a similarity error based at least in part on a difference according to an optimization function between: i) the user selection confirming or denying the similarity metric of the at least one stored document in the ranked list of stored documents, and ii) a ranked position of the at least one stored document within the ranked list of the stored documents; and training, by the at least one processor, parameters of the similarity model based at least in part on the similarity error. 5. The method of claim 1 , wherein the similarity model comprises an optimization objective to maximize the similarity metric between the input document and the set of stored documents. 6. The method of claim 5 , wherein the similarity model comprises at least one clustering model. 7. The method of claim 1 , further comprising: generating, by the at least one processor, a k-d tree of the set of stored documents; and determining, by the at least one processor, the ranked list of stored documents by using the similarity model to traverse the k-d tree. 8. The method of claim 1 , further comprising: receiving, by at least one processor, a new document having new text; generating, by the at least one processor, a plurality of new word embeddings for the new document; determining, by the at least one processor, a new average word embedding vector of the plurality of new word embeddings for the new document; and storing, by the at least one processor, the new document in the set of stored documents; wherein storing the new document in the set of stored documents comprises adding the new average word embedding vector to a cache of the stored average word embedding associated with the stored text of each stored document. 9. The method of claim 1 , wherein the average of the plurality of input document word embeddings comprises a weighted average based at least in part on a section of the text in which each word is located. 10. The method of claim 1 , further comprising: generating, by the at least one processor, a similarity alert based at least in part on the similarity metric of the input document to at least one stored document in the set of stored documents exceeding a predetermined similarity threshold; and causing, by the at least one processor, the computing device to produce the similarity alert to the user to alert the user of the at least one stored document. 11. The method of claim 1 , wherein the input document comprises a regulatory requirement document and the set of stored documents comprises a set of business controls documents. 12. The method of claim 1 , further comprising instructing at least one activity execution device, by the at least one processor, to execute at least one activity associated with the input document according to a highest ranked stored document in the ranked list of stored documents. 13. A system comprising: at least one processor configured to execute software instructions that cause the at least one processor to perform steps to: access a training set of stored documents; wherein the training set of stored documents comprise: at least one existing pair of stored documents representing at least one pair of stored documents that are similar to each other, and at least one non-existing pair of stored documents represe

Assignees

Capital One Services Llc

Inventors

Classifications

G06F40/194Primary
Calculation of difference between files · CPC title
G06F16/3347Primary
using vector based model · CPC title
G06F16/338
Presentation of query results · CPC title
G06F40/284
Lexical analysis, e.g. tokenisation or collocates · CPC title

Patent family

Related publications grouped by family.

View patent family 89664374

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11989506B2 cover?: Systems and methods of the present disclosure enable database search. The systems and/or methods may include receiving a search query that includes an input document having text. Word embeddings are generated within the input document, where the word embeddings include vector representations of words in the text of the input document. An average input document word embedding vector is determine…
Who is the assignee on this patent?: Capital One Services Llc
What technology area does this patent fall under?: Primary CPC classification G06F40/194. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 21 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).