What technology area does this patent fall under?

Primary CPC classification G06F16/2358. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 16 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for scalable dataset content embedding for improved database searchability

US12417218B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12417218-B2
Application number	US-202318458676-A
Country	US
Kind code	B2
Filing date	Aug 30, 2023
Priority date	Aug 30, 2023
Publication date	Sep 16, 2025
Grant date	Sep 16, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and systems for scalable dataset content embedding for improved searchability. For example, the system may retrieve a first dataset from a first data source. The system may generate a first data profile of the first dataset. The system may generate a latent index of the first data profile based on processing the first data profile using a first embedding algorithm. The system may receive, via a user interface, a first request for a first text string. The system may generate an embedded request corresponding to the first request based on processing the first text string using the first embedding algorithm. The system may process the embedded request using the latent index. The system may generate for display, in the user interface, a result based on processing the embedded request using the latent index.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for scalable dataset content embedding for improved searchability, the system comprising: one or more processors; and a non-transitory, computer-readable medium comprising instructions that when executed by the one or more processors cause operations comprising: retrieving a first dataset from a first data source; generating a first data profile based on metadata of the first dataset, wherein the first data profile is updated based on a length of time since the first dataset was used to generate the first data profile; generating a latent index of the first data profile based on processing the first data profile using a first embedding algorithm, wherein the first embedding algorithm is trained by: retrieving a previous version of the latent index; and determining an accuracy of a previous response generated based on processing a previous embedded request using the previous version of the latent index, wherein the previous embedded request comprises a request that was processed using the previous version of the latent index and was generated using the first embedding algorithm; receiving, via a user interface, a first request for a first text string, wherein receiving the first request for the first text string further comprises receiving a first function to perform on the first dataset; generating an embedded request corresponding to the first request based on processing the first text string using the first embedding algorithm; processing the embedded request using the latent index, and wherein processing the embedded request using the latent index further comprises performing the first function on the latent index using the embedded request; and generating for display, in the user interface, a result based on processing the embedded request using the latent index. 2. A method for scalable dataset content embedding for improved searchability, the method comprising: retrieving a first dataset from a first data source; generating a first data profile of the first dataset; generating a latent index of the first data profile based on processing the first data profile using a first embedding algorithm, wherein the first data profile is updated based on a number of requests compared against the latent index, wherein the first embedding algorithm is trained by: retrieving a previous version of the latent index; and determining an accuracy of a previous response generated based on processing a previous embedded request using the previous version of the latent index, wherein the previous embedded request comprises a request that was processed using the previous version of the latent index and was generated using the first embedding algorithm; receiving, via a user interface, a first request for a first text string; generating an embedded request corresponding to the first request based on processing the first text string using the first embedding algorithm; processing the embedded request using the latent index; and generating for display, in the user interface, a result based on processing the embedded request using the latent index. 3. The method of claim 2 , wherein generating the first data profile of the first dataset further comprises: retrieving first metadata for the first data profile; and generating a metadata sample based on the first metadata. 4. The method of claim 3 , wherein generating the metadata sample based on the first metadata further comprises: retrieving a required metadata category; determining a portion of the first metadata corresponding to the required metadata category; and using the portion to generate the metadata sample. 5. The method of claim 4 , wherein retrieving the required metadata category further comprises: receiving a user input of a required search category for the latent index; and determining the required metadata category based on the required search category. 6. The method of claim 2 , wherein processing the first data profile using the first embedding algorithm comprises: generating a first feature input based on the first dataset; inputting the first feature input into the first embedding algorithm, wherein the first embedding algorithm is trained on previous versions of the latent index and search results of previous search requests on the previous versions of the latent index; and receiving a first output from the first embedding algorithm, wherein the first output comprises the latent index. 7. The method of claim 2 , wherein retrieving the first dataset from the first data source further comprises: determining a number of changes between the first dataset and a previous dataset, wherein the previous dataset was used to generate a previous data profile; comparing the number of changes to a threshold number of changes; and determining to retrieve the first dataset in response to the number of changes exceeding the threshold number of changes. 8. The method of claim 2 , wherein generating the first data profile further comprises: determining a length of time since a previous data profile was generated; comparing the length of time to a threshold length of time; and determining to generate the first data profile in response to the length of time exceeding the threshold length of time. 9. The method of claim 2 , wherein generating the first data profile further comprises: determining a number of changes between the first dataset and a previous dataset, wherein the previous dataset was used to generate a previous data profile; comparing the number of changes to a threshold number of changes; and determining to generate the first data profile in response to the number of changes exceeding the threshold number of changes. 10. The method of claim 2 , wherein processing the embedded request using the latent index further comprises: retrieving a first vector corresponding to the latent index; retrieving a second vector corresponding to the embedded request; and determining a similarity between the first vector and the second vector. 11. The method of claim 2 , wherein processing the embedded request using the latent index further comprises: retrieving a first value from a first vector, wherein the first vector corresponds to the latent index; retrieving a second value from a second vector, wherein the second vector corresponds to the embedded request; and determining whether the first value matches the second value. 12. The method of claim 2 , wherein generating the embedded request corresponding to the first request based on processing the first text string using the first embedding algorithm comprises: generating a second feature input based on the first dataset; inputting the second feature input into the first embedding algorithm, wherein the first embedding algorithm is trained on previous versions of the latent index and search results of previous search requests on the previous versions of the latent index; and receiving a second output from the first embedding algorithm, wherein the second output comprises the embedded request. 13. The method of claim 2 , wherein generating for display the result based on processing the embedded request using the latent index further comprises: determining whether the first dataset comprises the first text string; and determining the result based on whether the first dataset comprises the first text string. 14. The method of claim 2 , wherein receiving the first request for the first text string further comprises receiving a first function to perform on the first dataset, and wherein processing the embedded request using the latent index further com

Assignees

Capital One Services Llc

Inventors

Classifications

G06F16/242
Query formulation · CPC title
G06F16/2358Primary
Change logging, detection, and notification (replication G06F16/27) · CPC title
G06F16/2228Primary
Indexing structures · CPC title

Patent family

Related publications grouped by family.

View patent family 94774532

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12417218B2 cover?: Methods and systems for scalable dataset content embedding for improved searchability. For example, the system may retrieve a first dataset from a first data source. The system may generate a first data profile of the first dataset. The system may generate a latent index of the first data profile based on processing the first data profile using a first embedding algorithm. The system may receiv…
Who is the assignee on this patent?: Capital One Services Llc
What technology area does this patent fall under?: Primary CPC classification G06F16/2358. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 16 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Profile data extensions

Detection of matching datasets using encode values

Data quality analysis

Systems and methods for a data search engine based on data profiles

Frequently asked questions