Techniques for data type detection with learned metadata

US12282829B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12282829-B2
Application numberUS-202117412034-A
CountryUS
Kind codeB2
Filing dateAug 25, 2021
Priority dateAug 25, 2021
Publication dateApr 22, 2025
Grant dateApr 22, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Various embodiments are generally directed to techniques for creating and utilizing multidimensional embedding spaces for data objects, such as to condition the data for input to a neural network, for instance. Some embodiments are particularly directed to detecting data types for structured data based on learned metadata. In many embodiments, an embedding space for a set of data objects may be customized with a set of dimensions that correspond to various characteristics of the set of data objects. For example, a set of data objects may correspond to a table with each data object corresponding to a data entry in the table. In such examples, correlations between different columns and/or within a column of data may be identified and utilized as metadata to improve classifications by the neural network.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus, the apparatus comprising: a processor; and memory comprising instructions that when executed by the processor cause the processor to: identify a set of data objects, the set of data objects comprising an array of values, wherein each data object in the set of data objects comprises a column value, a row value, and a data value; determine a first group of data objects in the set of data objects, the first group of data objects corresponding to a first column in the array of values; determine a second group of data objects in the set of data objects, the second group of data objects corresponding to a second column in the array of values; concatenate data values from the first group of data objects with the data values from the second group of data objects in a row-wise manner to produce a concatenated group of data values; determine at least one of a set of a plurality of embedding space parameters based on the concatenated group of data values, wherein the set of embedding space parameters define an embedding space comprising a plurality of dimensions; and generate a set of object vectors, the set of object vectors comprising an object vector for each data object in the set of data objects, each object vector in the set of object vectors to include a set of dimension values and each dimension value in the set of dimension values to correspond to one of the plurality of dimensions in the embedding space, wherein a respective object vector for a respective data object is generated based on a respective column value of the respective data object, a respective row value of the respective data object, and the set of embedding space parameters. 2. The apparatus of claim 1 , wherein the instructions, when executed by the processor, further cause the processor to: analyze each data value in the first group of data objects to determine one or more characteristics of the first column in the array of values; and determine at least one of the set of embedding space parameters based on the one or more characteristics of the first column in the array of values. 3. The apparatus of claim 2 , wherein the one or more characteristics of the first column comprises a range of lengths for data values in the first column. 4. The apparatus of claim 1 , wherein the first column in the array of values is adjacent to the second column in the array of values. 5. The apparatus of claim 1 , wherein the instructions, when executed by the processor, further cause the processor to: provide the set of object vectors as input to a machine learning algorithm; and determine a classification of each data object in the set of data objects based on output of the machine learning algorithm in response to input of the set of object vectors. 6. The apparatus of claim 1 , wherein the instructions, when executed by the processor, further cause the processor to: determine a third group of data objects in the set of data objects, the third group of data objects corresponding to a third column in the array of values that is adjacent to the first column or the second column in the array of values; concatenate data values from the first group of data objects with the data values from one or more of the second group of data objects and the third group of data objects in a row-wise manner to produce a second concatenated group of data values; and determine at least one of the set of embedding space parameters based on the second concatenated group of data values. 7. The apparatus of claim 6 , wherein the instructions, when executed by the processor, further cause the processor to: determine a fourth group of data objects in the set of data objects, the fourth group of data objects corresponding to a fourth column in the array of values that is adjacent to the first column, the second column, or the third column in the array of values; concatenate data values from the first group of data objects with the data values from the second group of data objects, the third group of data objects, and the fourth group of data objects in a row-wise manner to produce a third concatenated group of data values; and determine at least one of the set of embedding space parameters based on the third concatenated group of data values. 8. The apparatus of claim 1 , wherein the plurality of dimensions that define the embedding space comprise two or more of a row dimension, a column dimension, a data value dimension, a column statistic dimension, and a concatenation dimension. 9. The apparatus of claim 1 , wherein the plurality of dimensions that define the embedding space comprise a plurality of concatenation dimensions and at least one column statistic dimension. 10. At least one non-transitory computer-readable medium comprising a set of instructions that, in response to execution by a processor circuit, cause the processor circuit to: identify a set of data objects, the set of data objects comprising an array of values, wherein each data object in the set of data objects comprises a column value, a row value, and a data value; determine a first group of data objects in the set of data objects, the first group of data objects corresponding to a first column in the array of values; determine a second group of data objects in the set of data objects, the second group of data objects corresponding to a second column in the array of values that is adjacent to the first column in the array of values; concatenate data values from the first group of data objects with the data values from the second group of data objects in a row-wise manner to produce a concatenated group of data values; determine at least one of a set of a plurality of embedding space parameters based on the concatenated group of data values, wherein the set of embedding space parameters define an embedding space comprising a plurality of dimensions; and generate a set of object vectors, the set of object vectors comprising an object vector for each data object in the set of data objects, each object vector in the set of object vectors to include a set of dimension values and each dimension value in the set of dimension values to correspond to one of the plurality of dimensions in the embedding space, wherein a respective object vector for a respective data object is generated based on a respective column value of the respective data object, a respective row value of the respective data object, and the set of embedding space parameters. 11. The at least one non-transitory computer-readable medium of claim 10 , wherein the set of instructions, in response to execution by the processor circuit, further cause the processor circuit to: analyze each data value in the first group of data objects to determine one or more characteristics of the first column in the array of values; and determine at least one of the set of embedding space parameters based on the one or more characteristics of the first column in the array of values. 12. The non-transitory computer-readable medium of claim 11 , wherein the one or more characteristics of the first column comprises a range of lengths for data values in the first column. 13. The non-transitory computer-readable medium of claim 10 , wherein the set of instructions, in response to execution by the processor circuit, further cause the processor circuit to: determine a third group of data objects in the set of data objects, the third group of data objects corresponding to a third column in the array of values that is adjacent to the first column on an opposite side as the second column in the array of values; concatenate data values from the first group of data object

Assignees

Inventors

Classifications

  • G06N20/00Primary

    Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12282829B2 cover?
Various embodiments are generally directed to techniques for creating and utilizing multidimensional embedding spaces for data objects, such as to condition the data for input to a neural network, for instance. Some embodiments are particularly directed to detecting data types for structured data based on learned metadata. In many embodiments, an embedding space for a set of data objects may be…
Who is the assignee on this patent?
Capital One Services Llc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 22 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).