Lineage data for data records

US11169959B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11169959-B2
Application numberUS-201816036326-A
CountryUS
Kind codeB2
Filing dateJul 16, 2018
Priority dateNov 18, 2015
Publication dateNov 9, 2021
Grant dateNov 9, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system may read source data corresponding to a source variable and apply a transformation to the source variable to generate an output variable. The transformation may include logic, and the output variable may be configured for ingestion into a big data storage format. The system may record lineage data of the output variable that comprises the transformation and/or the source variable. The system may also receive a request to generate a requested output variable. The requested output variable may be generated from a second transformation that is the same as the first transformation. The system may thus match the first transformation to the second transformation using the lineage data. In response to matching the first transformation to the second transformation, the system may deny the request. The original output variable may be returned in response to the matching the first transformation to the second transformation.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: applying, by a processor, a first transformation process to a source variable to generate a first output variable, wherein the source variable identifies a field in a record that includes source data that is transformed to output data corresponding to the first output variable, wherein the first transformation process uses a logic map to generate the first output variable by applying data transformations to the source variable and intermediate variables generated from the source data, wherein the logic map depicts a sequence of data transformations applied to input variables that were performed to generate the first output variable, wherein the input variables comprise at least the source variable and the intermediate variables generated from the source data; ingesting, by the processor, the first output variable into a data storage format by using a control node to distribute tasks among nodes for processing; and recording, by the processor, first lineage data of the first output variable, wherein the first lineage data includes a history of one or more transformation processes performed on the source variable to produce the first output variable such that forward and backward transformation linkages can be re-created for use in analytics, wherein the first lineage data includes at least one of the first transformation process or the source variable. 2. The method of claim 1 , wherein the first transformation process comprises at least one of executing an action against the source data or generating a transformed value for an intermediate variable or the first output variable in response to an evaluation of a logical statement from the logic map against a value of the source variable, wherein the logical statement describes one or more transformation process steps to be applied to the source variable. 3. The method of claim 1 , wherein the first transformation process comprises data formatting including at least one of stripping white space or truncating numbers to a predetermined length. 4. The method of claim 1 , further comprising: receiving, by the processor, a request to generate a second output variable, wherein the second output variable is generated from a second transformation process, and wherein the second transformation process is the same as the first transformation process; and matching, by the processor, the first transformation process to the second transformation process using the first lineage data. 5. The method of claim 4 , further comprising denying, by the processor, the request in response to the matching the first transformation process to the second transformation process. 6. The method of claim 4 , further comprising returning, by the processor, the first output variable in response to the matching the first transformation process to the second transformation process. 7. The method of claim 1 , wherein the first lineage data is stored in a tuple comprising at least one of the source variable or the first output variable. 8. The method of claim 1 , wherein the first lineage data is stored on a distributed file system. 9. The method of claim 1 , wherein the source data includes raw data files including a plurality of records, and wherein each of the plurality of records include data obtained from at least one of history of purchase transactions over time, web registrations, social media, records of charge (ROC), summaries of charges (SOC), personally identifying information (PII) or internal data. 10. The method of claim 1 , further comprising reading, by a processor, the source data corresponding to the source variable. 11. The method of claim 1 , further comprising applying, by the processor, data transformations to intermediate variables generated from the source data. 12. The method of claim 1 , wherein the first output variable is configured for ingestion into the data storage format. 13. The method of claim 1 , wherein the nodes process the first output variable in parallel with a second output variable to expedite the processing. 14. The method of claim 1 , further comprising: detecting duplicative data transformations in order to reduce duplicative data transformations using the first lineage data. 15. A computer-based system, comprising: a processor; a tangible, non-transitory memory configured to communicate with the processor, the tangible, non-transitory memory having instructions stored thereon that, in response to execution by the processor, cause the computer-based system to perform operations comprising: applying a first transformation process to a source variable to generate a first output variable, wherein the source variable identifies a field in a record that includes source data that is transformed to output data corresponding to the first output variable, wherein the first transformation process uses a logic map to generate the first output variable by applying data transformations to the source variable and intermediate variables generated from the source data, wherein the logic map depicts a sequence of data transformations applied to input variables that were performed to generate the first output variable, wherein the input variables comprise at least the source variable and the intermediate variables generated from the source data; ingesting the first output variable into a data storage format by using a control node to distribute tasks among nodes for processing; and recording, by the processor, first lineage data of the first output variable, wherein the first lineage data includes a history of one or more transformation processes performed on the source variable to produce the first output variable such that forward and backward transformation linkages can be re-created for use in analytics, wherein the first lineage data includes at least one of the first transformation process or the source variable. 16. The system of claim 15 , wherein the operations further comprise: receiving a request to generate a second output variable, wherein the second output variable is generated from a second transformation process, and wherein the second transformation process is the same as the first transformation process; and matching the first transformation process to the second transformation process using the first lineage data. 17. The system of claim 16 , wherein the operations further comprise denying the request in response to the matching the first transformation process to the second transformation process. 18. The system of claim 15 , wherein the first transformation process comprises at least one of executing an action against the source data or generating a transformed value for an intermediate variable or the first output variable in response to an evaluation of a logical statement from the logic map against a value of the source variable, wherein the logical statement describes one or more transformation process steps to be applied to the source variable. 19. The system of claim 15 , wherein the first transformation process comprises data formatting including at least one of stripping white space or truncating numbers to a predetermined length. 20. The system of claim 15 , wherein the source data includes raw data files including a plurality of records, and wherein each of the plurality of records include data obtained from at least one of history of purchase transactions over time, web registrations, social media, records of charge (ROC), summaries of charges (SOC), personally identifying information (PII) or

Assignees

Inventors

Classifications

  • Distributed queries · CPC title

  • G06F16/116Primary

    Details of conversion of file system types or formats · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11169959B2 cover?
A system may read source data corresponding to a source variable and apply a transformation to the source variable to generate an output variable. The transformation may include logic, and the output variable may be configured for ingestion into a big data storage format. The system may record lineage data of the output variable that comprises the transformation and/or the source variable. The …
Who is the assignee on this patent?
American Express Travel Related Services Co Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/116. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 09 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).