What technology area does this patent fall under?

Primary CPC classification G06F16/3347. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 15 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems for database searching and database schemas management and methods of use thereof

US12361209B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12361209-B2
Application number	US-202418668959-A
Country	US
Kind code	B2
Filing date	May 20, 2024
Priority date	Jul 27, 2022
Publication date	Jul 15, 2025
Grant date	Jul 15, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods of the present disclosure enable database search. The systems and/or methods may include receiving a search query that includes an input document having text. Word embeddings are generated within the input document, where the word embeddings include vector representations of words in the text of the input document. An average input document word embedding vector is determined for the word embeddings of the input document. A set of stored documents is accessed, where each stored document includes a stored text has a particular average stored document word embedding vector. A similarity model is used to determine a similarity metric measuring the similarity between the input document and each stored document based on the average input document word embedding vector and the particular average stored document word embedding vector of each stored document.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: accessing, by at least one processor, a training set of stored documents; wherein the training set of stored documents comprise: at least one existing pair of stored documents representing at least one pair of stored documents that are similar to each other, and at least one non-existing pair of stored documents representing at least one pair of stored documents that are not similar to each other; generating, by the at least one processor, a plurality of initial stored document word embeddings within each stored document of the set of stored documents; wherein the plurality of initial stored document word embeddings comprise a plurality of stored document vector representations of a plurality of words in text of each stored document; determining, by the at least one processor, an average stored document word embedding vector for the plurality of initial stored document word embeddings for each stored document; utilizing, by the at least one processor, a similarity model to determine a similarity metric of a similarity between a first stored document and a second stored document of each candidate pair of a plurality of candidate pairs of stored documents in the set of stored documents based at least in part on the average stored document word embedding vector of each of the first stored document and the second stored document; generating, by the at least one processor, a refined average stored document word embedding by backpropagating an error of the similarity metric of each candidate pair, wherein the error is based at least in part on the at least one existing pair and the at least one non-existing pair; and returning, by the at least one processor, in response to a search query comprising a search document, at least one stored document of the set of stored documents based at least in part on a comparison of the search document to the plurality of refined stored document word embeddings for each stored document of the set of stored documents. 2. The method of claim 1 , wherein the similarity model comprises a cosine similarity determination. 3. The method of claim 1 , further comprising: utilizing, by the at least one processor, a word vectorization model to generate the plurality of initial stored document word embeddings for the plurality of stored documents; receiving, by the at least one processor, a user selection confirming or denying the similarity metric of at least one stored document in the plurality of stored documents; determining, by the at least one processor, a similarity error based at least in part on a difference according to an optimization function between: i) the user selection confirming or denying the similarity metric of the at least one stored document in the plurality of stored documents, and ii) a ranked position of the at least one stored document within the plurality of stored documents; and training, by the at least one processor, parameters of the word vectorization model based at least in part on the similarity error. 4. The method of claim 1 , further comprising: receiving, by the at least one processor, a user selection confirming or denying the similarity metric of at least one stored document in the plurality of stored documents; determining, by the at least one processor, a similarity error based at least in part on a difference according to an optimization function between: i) the user selection confirming or denying the similarity metric of the at least one stored document in the plurality of stored documents, and ii) a ranked position of the at least one stored document within the plurality of stored documents; and training, by the at least one processor, parameters of the similarity model based at least in part on the similarity error. 5. The method of claim 1 , wherein the similarity model comprises an optimization objective to maximize the similarity metric between the plurality of stored documents and the training set of stored documents. 6. The method of claim 5 , wherein the similarity model comprises at least one clustering model. 7. The method of claim 1 , further comprising: generating, by the at least one processor, a k-d tree of the set of stored documents; and determining, by the at least one processor, the plurality of stored documents by using the similarity model to traverse the k-d tree. 8. The method of claim 1 , further comprising: receiving, by at least one processor, a new document having new text; generating, by the at least one processor, a plurality of new word embeddings for the new document; determining, by the at least one processor, a new average word embedding vector of the plurality of new word embeddings for the new document; and storing, by the at least one processor, the new document in the set of stored documents; wherein storing the new document in the set of stored documents comprises adding the new average word embedding vector to a cache of the stored average word embedding associated with the stored text of each stored document. 9. The method of claim 1 , wherein the average of the plurality of stored document word embeddings comprises a weighted average based at least in part on a section of the text in which each word is located. 10. The method of claim 1 , further comprising: generating, by the at least one processor, a similarity alert based at least in part on the similarity metric of the stored document to at least one stored document in the set of stored documents exceeding a predetermined similarity threshold; and causing, by the at least one processor, a computing device to produce the similarity alert to a user to alert the user of the at least one stored document. 11. A system comprising: at least one processor; and at least one storage medium communicating with the at least one processor and having encoded thereon executable instructions that, when executed by the at least one processor, cause the at least one processor to carry out a method comprising: accessing a training set of stored documents; wherein the training set of stored documents comprise: at least one existing pair of stored documents representing at least one pair of stored documents that are similar to each other, and at least one non-existing pair of stored documents representing at least one pair of stored documents that are not similar to each other; generating a plurality of initial stored document word embeddings within each stored document of the set of stored documents; wherein the plurality of initial stored document word embeddings comprise a plurality of stored document vector representations of a plurality of words in text of each stored document; determining an average stored document word embedding vector for the plurality of initial stored document word embeddings for each stored document; utilizing a similarity model to determine a similarity metric of a similarity between a first stored document and a second stored document of each candidate pair of a plurality of candidate pairs of stored documents in the set of stored documents based at least in part on the average stored document word embedding vector of each of the first stored document and the second stored document; generating a refined average stored document word embedding by backpropagating an error of the similarity metric of each candidate pair, wherein the error is based at least in part on the at least one existing pair and the at least one non-existing pair; and returning in response to a search query comprising a search document, at least one stored document of the set of stored documents based at least in part on a comparison of the search documen

Assignees

Capital One Services Llc

Inventors

Classifications

G06F16/338
Presentation of query results · CPC title
G06F40/284
Lexical analysis, e.g. tokenisation or collocates · CPC title
G06F16/3347Primary
using vector based model · CPC title
G06F40/194Primary
Calculation of difference between files · CPC title

Patent family

Related publications grouped by family.

View patent family 89664374

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12361209B2 cover?: Systems and methods of the present disclosure enable database search. The systems and/or methods may include receiving a search query that includes an input document having text. Word embeddings are generated within the input document, where the word embeddings include vector representations of words in the text of the input document. An average input document word embedding vector is determine…
Who is the assignee on this patent?: Capital One Services Llc
What technology area does this patent fall under?: Primary CPC classification G06F16/3347. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 15 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).