Profile data extensions
US-2022358100-A1 · Nov 10, 2022 · US
US12417218B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12417218-B2 |
| Application number | US-202318458676-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 30, 2023 |
| Priority date | Aug 30, 2023 |
| Publication date | Sep 16, 2025 |
| Grant date | Sep 16, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and systems for scalable dataset content embedding for improved searchability. For example, the system may retrieve a first dataset from a first data source. The system may generate a first data profile of the first dataset. The system may generate a latent index of the first data profile based on processing the first data profile using a first embedding algorithm. The system may receive, via a user interface, a first request for a first text string. The system may generate an embedded request corresponding to the first request based on processing the first text string using the first embedding algorithm. The system may process the embedded request using the latent index. The system may generate for display, in the user interface, a result based on processing the embedded request using the latent index.
Opening claim text (preview).
What is claimed is: 1. A system for scalable dataset content embedding for improved searchability, the system comprising: one or more processors; and a non-transitory, computer-readable medium comprising instructions that when executed by the one or more processors cause operations comprising: retrieving a first dataset from a first data source; generating a first data profile based on metadata of the first dataset, wherein the first data profile is updated based on a length of time since the first dataset was used to generate the first data profile; generating a latent index of the first data profile based on processing the first data profile using a first embedding algorithm, wherein the first embedding algorithm is trained by: retrieving a previous version of the latent index; and determining an accuracy of a previous response generated based on processing a previous embedded request using the previous version of the latent index, wherein the previous embedded request comprises a request that was processed using the previous version of the latent index and was generated using the first embedding algorithm; receiving, via a user interface, a first request for a first text string, wherein receiving the first request for the first text string further comprises receiving a first function to perform on the first dataset; generating an embedded request corresponding to the first request based on processing the first text string using the first embedding algorithm; processing the embedded request using the latent index, and wherein processing the embedded request using the latent index further comprises performing the first function on the latent index using the embedded request; and generating for display, in the user interface, a result based on processing the embedded request using the latent index. 2. A method for scalable dataset content embedding for improved searchability, the method comprising: retrieving a first dataset from a first data source; generating a first data profile of the first dataset; generating a latent index of the first data profile based on processing the first data profile using a first embedding algorithm, wherein the first data profile is updated based on a number of requests compared against the latent index, wherein the first embedding algorithm is trained by: retrieving a previous version of the latent index; and determining an accuracy of a previous response generated based on processing a previous embedded request using the previous version of the latent index, wherein the previous embedded request comprises a request that was processed using the previous version of the latent index and was generated using the first embedding algorithm; receiving, via a user interface, a first request for a first text string; generating an embedded request corresponding to the first request based on processing the first text string using the first embedding algorithm; processing the embedded request using the latent index; and generating for display, in the user interface, a result based on processing the embedded request using the latent index. 3. The method of claim 2 , wherein generating the first data profile of the first dataset further comprises: retrieving first metadata for the first data profile; and generating a metadata sample based on the first metadata. 4. The method of claim 3 , wherein generating the metadata sample based on the first metadata further comprises: retrieving a required metadata category; determining a portion of the first metadata corresponding to the required metadata category; and using the portion to generate the metadata sample. 5. The method of claim 4 , wherein retrieving the required metadata category further comprises: receiving a user input of a required search category for the latent index; and determining the required metadata category based on the required search category. 6. The method of claim 2 , wherein processing the first data profile using the first embedding algorithm comprises: generating a first feature input based on the first dataset; inputting the first feature input into the first embedding algorithm, wherein the first embedding algorithm is trained on previous versions of the latent index and search results of previous search requests on the previous versions of the latent index; and receiving a first output from the first embedding algorithm, wherein the first output comprises the latent index. 7. The method of claim 2 , wherein retrieving the first dataset from the first data source further comprises: determining a number of changes between the first dataset and a previous dataset, wherein the previous dataset was used to generate a previous data profile; comparing the number of changes to a threshold number of changes; and determining to retrieve the first dataset in response to the number of changes exceeding the threshold number of changes. 8. The method of claim 2 , wherein generating the first data profile further comprises: determining a length of time since a previous data profile was generated; comparing the length of time to a threshold length of time; and determining to generate the first data profile in response to the length of time exceeding the threshold length of time. 9. The method of claim 2 , wherein generating the first data profile further comprises: determining a number of changes between the first dataset and a previous dataset, wherein the previous dataset was used to generate a previous data profile; comparing the number of changes to a threshold number of changes; and determining to generate the first data profile in response to the number of changes exceeding the threshold number of changes. 10. The method of claim 2 , wherein processing the embedded request using the latent index further comprises: retrieving a first vector corresponding to the latent index; retrieving a second vector corresponding to the embedded request; and determining a similarity between the first vector and the second vector. 11. The method of claim 2 , wherein processing the embedded request using the latent index further comprises: retrieving a first value from a first vector, wherein the first vector corresponds to the latent index; retrieving a second value from a second vector, wherein the second vector corresponds to the embedded request; and determining whether the first value matches the second value. 12. The method of claim 2 , wherein generating the embedded request corresponding to the first request based on processing the first text string using the first embedding algorithm comprises: generating a second feature input based on the first dataset; inputting the second feature input into the first embedding algorithm, wherein the first embedding algorithm is trained on previous versions of the latent index and search results of previous search requests on the previous versions of the latent index; and receiving a second output from the first embedding algorithm, wherein the second output comprises the embedded request. 13. The method of claim 2 , wherein generating for display the result based on processing the embedded request using the latent index further comprises: determining whether the first dataset comprises the first text string; and determining the result based on whether the first dataset comprises the first text string. 14. The method of claim 2 , wherein receiving the first request for the first text string further comprises receiving a first function to perform on the first dataset, and wherein processing the embedded request using the latent index further com
Query formulation · CPC title
Change logging, detection, and notification (replication G06F16/27) · CPC title
Indexing structures · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.