Systems and methods for scalable dataset content embedding for improved database searchability

US2026003844A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2026003844-A1
Application numberUS-202519319697-A
CountryUS
Kind codeA1
Filing dateSep 4, 2025
Priority dateAug 30, 2023
Publication dateJan 1, 2026
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and systems for scalable dataset content embedding for improved searchability. For example, the system may retrieve a first dataset from a first data source. The system may generate a first data profile of the first dataset. The system may generate a latent index of the first data profile based on processing the first data profile using a first embedding algorithm. The system may receive, via a user interface, a first request for a first text string. The system may generate an embedded request corresponding to the first request based on processing the first text string using the first embedding algorithm. The system may process the embedded request using the latent index. The system may generate for display, in the user interface, a result based on processing the embedded request using the latent index.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system for scalable dataset content embedding for improved searchability, the system comprising: one or more processors; and a non-transitory, computer-readable medium comprising instructions that when executed by the one or more processors cause operations comprising: receiving a latent index, wherein the latent index is generated by processing a dataset using an embedding algorithm; receiving, via a user interface, a request for a text string, wherein receiving the request for the text string further comprises receiving a function to perform on the dataset; generating an embedded request corresponding to the request based on processing the text string using the embedding algorithm; processing the embedded request using the latent index, and wherein processing the embedded request using the latent index further comprises performing the function on the latent index using the embedded request; and generating for display, in the user interface, a result based on processing the embedded request using the latent index. 2 . A method for scalable dataset content embedding for improved searchability, the method comprising: receiving a latent index, wherein the latent index is generated by processing a dataset using a first embedding algorithm; receiving, via a user interface, a first request for a first text string; generating an embedded request corresponding to the first request based on processing the first text string using the first embedding algorithm; processing the embedded request using the latent index; and generating for display, in the user interface, a result based on processing the embedded request using the latent index. 3 . The method of claim 2 , wherein the latent index is generated by: retrieving first metadata for the dataset; and generating a metadata sample based on the first metadata. 4 . The method of claim 3 , wherein generating the metadata sample based on the first metadata further comprises: retrieving a required metadata category; determining a portion of the first metadata corresponding to the required metadata category; and using the portion to generate the metadata sample. 5 . The method of claim 4 , wherein retrieving the required metadata category further comprises: receiving a user input of a required search category for the latent index; and determining the required metadata category based on the required search category. 6 . The method of claim 2 , wherein processing the dataset using the first embedding algorithm comprises: generating a first feature input based on the dataset; inputting the first feature input into the first embedding algorithm, wherein the first embedding algorithm is trained on previous versions of the latent index and search results of previous search requests on the previous versions of the latent index; and receiving a first output from the first embedding algorithm, wherein the first output comprises the latent index. 7 . The method of claim 2 , wherein the latent index is generated by: determining a number of changes between the dataset and a previous dataset, wherein the previous dataset was used to generate a previous latent index; comparing the number of changes to a threshold number of changes; and determining to retrieve the dataset in response to the number of changes exceeding the threshold number of changes. 8 . The method of claim 2 , wherein the latent index is generated by: determining a length of time since a previous latent index was generated; comparing the length of time to a threshold length of time; and determining to generate the latent index in response to the length of time exceeding the threshold length of time. 9 . The method of claim 2 , wherein the latent index is generated by: determining a number of changes between the dataset and a previous dataset, wherein the previous dataset was used to generate a previous latent index; comparing the number of changes to a threshold number of changes; and determining to generate the latent index in response to the number of changes exceeding the threshold number of changes. 10 . The method of claim 2 , wherein processing the embedded request using the latent index further comprises: retrieving a first vector corresponding to the latent index; retrieving a second vector corresponding to the embedded request; and determining a similarity between the first vector and the second vector. 11 . The method of claim 2 , wherein processing the embedded request using the latent index further comprises: retrieving a first value from a first vector, wherein the first vector corresponds to the latent index; retrieving a second value from a second vector, wherein the second vector corresponds to the embedded request; and determining whether the first value matches the second value. 12 . The method of claim 2 , wherein generating the embedded request corresponding to the first request based on processing the first text string using the first embedding algorithm comprises: generating a second feature input based on the dataset; inputting the second feature input into the first embedding algorithm, wherein the first embedding algorithm is trained on previous versions of the latent index and search results of previous search requests on the previous versions of the latent index; and receiving a second output from the first embedding algorithm, wherein the second output comprises the embedded request. 13 . The method of claim 2 , wherein generating for display the result based on processing the embedded request using the latent index further comprises: determining whether the dataset comprises the first text string; and determining the result based on whether the dataset comprises the first text string. 14 . The method of claim 2 , wherein receiving the first request for the first text string further comprises receiving a first function to perform on the dataset, and wherein processing the embedded request using the latent index further comprises performing the first function on the latent index using the embedded request. 15 . A non-transitory, computer-readable medium comprising instructions that when executed by one or more processors cause operations comprising: receiving a latent index, wherein the latent index is generated by processing a dataset using a first embedding algorithm receiving, via a user interface, a first request for a first text string; generating an embedded request corresponding to the first request based on processing the first text string using the first embedding algorithm; processing the embedded request using the latent index; and generating for display, in the user interface, a result based on processing the embedded request using the latent index. 16 . The non-transitory, computer-readable medium of claim 15 , wherein the latent index is generated by: retrieving first metadata for the dataset; and generating a metadata sample based on the first metadata. 17 . The non-transitory, computer-readable medium of claim 16 , wherein generating the metadata sample based on the first metadata further comprises: retrieving a required metadata category; determining a portion of the first metadata corresponding to the required metadata category; and using the portion to generate the metadata sample. 18 . The non-transitory, computer-readable medium of claim 17 , wherein retrieving the required metadata category further comprises: receiving a user input of a required search category for the

Assignees

Inventors

Classifications

  • Query formulation · CPC title

  • Change logging, detection, and notification (replication G06F16/27) · CPC title

  • Indexing structures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2026003844A1 cover?
Methods and systems for scalable dataset content embedding for improved searchability. For example, the system may retrieve a first dataset from a first data source. The system may generate a first data profile of the first dataset. The system may generate a latent index of the first data profile based on processing the first data profile using a first embedding algorithm. The system may receiv…
Who is the assignee on this patent?
Capital One Services Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/2358. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 01 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).