Processing data sets in a big data repository
US-2017139746-A1 · May 18, 2017 · US
US2017017678A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2017017678-A1 |
| Application number | US-201514799293-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jul 14, 2015 |
| Priority date | Jul 14, 2015 |
| Publication date | Jan 19, 2017 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The process includes receiving a data set comprising a plurality of rows and a plurality of columns, and applying a first rule based decisioning to the data set to generate a first layer of metadata that comprises at least one of a key, a type indicator, a categorical indicator, and/or a continuous indicator. The first layer of metadata may be descriptive of the data set. The processor may further apply a second rule based decisioning to the first layer to generate a second layer that includes at least one of the key, the type indicator, the categorical indicator, or the continuous indicator. The second layer may be descriptive of the first layer. The process may also include generating an output file from at least one of the first layer or the second layer.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: receiving, by a processor, a data set comprising a plurality of rows and a plurality of columns; applying, by the processor, a first rule based decisioning to the data set to generate a first layer of metadata, wherein the first layer of metadata comprises at least one of a key, a type indicator, a categorical indicator, or a continuous indicator, wherein the first layer of metadata is descriptive of the data set; applying, by the processor, a second rule based decisioning to the first layer of metadata to generate a second layer, wherein the second layer comprises at least one of the key, the type indicator, the categorical indicator, or the continuous indicator, wherein the second layer is descriptive of the first layer of metadata; and generating, by the processor, an output file from at least one of the first layer of metadata or the second layer. 2 . The method of claim 1 , further comprising running, by the processor, a regular expression on the first layer of metadata. 3 . The method of claim 1 , further comprising computing, by the processor, percentile calculations for a column of the plurality of columns. 4 . The method of claim 1 , further comprising formatting, by the processor, the first layer of metadata and the second layer for recursive decisioning. 5 . The method of claim 1 , wherein the data set is stored on a distributed storage. 6 . The method of claim 5 , further comprising communicating, by the processor, with the distributed storage across a network. 7 . The method of claim 5 , wherein the processor is in a node of the distributed storage. 8 . A computer-based system, comprising: a processor, a tangible, non-transitory memory configured to communicate with the processor, the tangible, non-transitory memory having instructions stored thereon that, in response to execution by the processor, cause the processor to perform operations comprising: receiving, by the processor, a data set comprising a plurality of rows and a plurality of columns; applying, by the processor, a first rule based decisioning to the data set to generate a first layer of metadata, wherein the first layer of metadata comprises at least one of a key, a type indicator, a categorical indicator, or a continuous indicator, wherein the first layer of metadata is descriptive of the data set; applying, by the processor, a second rule based decisioning to the first layer of metadata to generate a second layer, wherein the second layer comprises at least one of the key, the type indicator, the categorical indicator, or the continuous indicator, wherein the second layer is descriptive of the first layer of metadata; and generating, by the processor, an output file from at least one of the first layer of metadata or the second layer. 9 . The computer-based system of claim 8 , further comprising running, by the processor, a regular expression on the first layer of metadata. 10 . The computer-based system of claim 8 , further comprising computing, by the processor, percentile calculations for a column of the plurality of columns. 11 . The computer-based system of claim 8 , further comprising formatting, by the processor, the first layer of metadata and the second layer for recursive decisioning. 12 . The computer-based system of claim 8 , wherein the data set is stored on a distributed storage. 13 . The computer-based system of claim 12 , further comprising communicating, by the processor, with the distributed storage across a network. 14 . The computer-based system of claim 12 , wherein the processor is in a node of the distributed storage. 15 . An article of manufacture including a non-transitory, tangible computer readable storage medium having instructions stored thereon that, in response to execution by a computer-based system, cause the computer-based system to perform operations comprising: receiving, by a processor, a data set comprising a plurality of rows and a plurality of columns; applying, by the processor, a first rule based decisioning to the data set to generate a first layer of metadata, wherein the first layer of metadata comprises at least one of a key, a type indicator, a categorical indicator, or a continuous indicator, wherein the first layer of metadata is descriptive of the data set; applying, by the processor, a second rule based decisioning to the first layer of metadata to generate a second layer, wherein the second layer comprises at least one of the key, the type indicator, the categorical indicator, or the continuous indicator, wherein the second layer is descriptive of the first layer of metadata; and generating, by the processor, an output file from at least one of the first layer of metadata or the second layer. 16 . The computer-based system of claim 8 , further comprising running, by the processor, a regular expression on the first layer of metadata. 17 . The computer-based system of claim 8 , further comprising computing, by the processor, percentile calculations for a column of the plurality of columns. 18 . The computer-based system of claim 8 , further comprising formatting, by the processor, the first layer of metadata and the second layer for recursive decisioning. 19 . The computer-based system of claim 8 , wherein the data set is stored on a distributed storage system. 20 . The computer-based system of claim 12 , further comprising communicating, by the processor, with the distributed storage across a network.
Physics · mapped topic
Design, administration or maintenance of databases · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.