Data lineage across multiple marketplaces

US10089335B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10089335-B2
Application numberUS-201213545398-A
CountryUS
Kind codeB2
Filing dateJul 10, 2012
Priority dateJul 10, 2012
Publication dateOct 2, 2018
Grant dateOct 2, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Tracking lineage of data. A method may be practiced in a network computing environment including a plurality of interconnected systems where data is shared between the systems. A method includes accessing a dataset. The dataset is associated with lineage metadata. The lineage metadata includes data indicating the original source of the data, one or more intermediary entities that have performed operations on the dataset, and the nature of operations performed on the dataset. A first entity performs an operation on the dataset. As a result of performing a first operation on the dataset, the method includes updating the lineage metadata to indicate that the first entity performed the operation on the dataset. The method further includes providing functionality for determining if the lineage metadata has been compromised in that the lineage metadata has been at least one of removed from association with the dataset, is corrupted, or is incomplete.

First claim

Opening claim text (preview).

What is claimed is: 1. In a network computing environment comprising a plurality of interconnected systems where data is shared between the systems, a method of tracking the source, lineage, and integrity of data, the method comprising: accessing a dataset, the dataset having been signed by a first authority to ensure that the dataset has not been compromised; accessing lineage metadata associated with the dataset, the lineage metadata comprising data indicating the original source of the data and information about one or more operations which have been performed on the dataset, the information for each of the one or more operations including when the each operation was performed, an identity of an entity which performed the each operation, and the nature of the each operation, wherein the lineage metadata is signed by a second authority using a cryptographic certificate which allows dataset users to determine whether the lineage metadata has been compromised and whether to trust the second authority; determining a validity for the dataset by analyzing at least the signature of the first authority; determining a validity for the lineage metadata by analyzing at least the signature of the second authority; determining a trust level for the second authority; and based upon the determined validity of the dataset, the validity of the lineage metadata, and the determined trust level of the authority, performing an action that is associated with the dataset and the determined validity of the dataset, validity of the lineage metadata, and the trust level for the second authority. 2. The method of claim 1 , further comprising determining that the lineage metadata has been compromised including performing a checksum on the dataset and the lineage metadata. 3. The method of claim 1 , further comprising determining that the lineage metadata has been compromised including determining that embedded lineage metadata has been removed from the dataset. 4. The method of claim 1 , wherein invalidating the dataset comprises making the dataset generally unavailable. 5. The method of claim 1 , wherein invalidating the dataset comprises marking the dataset as invalid, but nonetheless allowing entities to obtain the dataset. 6. In a network computing environment comprising a plurality of interconnected systems where data is shared between the systems, a method of tracking lineage of data, the method comprising: accessing a dataset, the dataset having been signed by a first authority to ensure that the dataset has not been compromised; at a first entity, performing an operation on the dataset; accessing lineage metadata associated with the dataset, the lineage metadata comprising data indicating the original source of the data and information about one or more operations which have been performed on the dataset, the information for each of the one or more operations including when the each operation was performed, an identity of an entity which performed the each operation, and the nature of the each operation, wherein the lineage metadata is signed by a second authority using a cryptographic certificate which allows dataset users to determine whether the lineage metadata has been compromised and whether to trust the second authority; as a result of performing an operation on the dataset at the first entity, updating the lineage metadata to indicate that the first entity performed the operation on the dataset such that the lineage metadata includes when the operation performed at the first entity was performed, information about the first entity, an indication that the operation performed at the first entity was performed at the first entity, and the nature of the operation performed at the first entity; and computing a value which another entity can use to determine validity of the dataset and the lineage metadata. 7. The method of claim 6 , further comprising providing functionality for determining if the lineage metadata has been compromised including performing a checksum on the dataset and the lineage metadata. 8. The method of claim 6 , further comprising providing functionality for determining if the lineage metadata has been compromised including signing the dataset and the lineage metadata using an encryption key. 9. The method of claim 8 , wherein the encryption key is part of a chain of keys used by various entities to add lineage metadata as the result of previous operations. 10. The method of claim 6 , wherein the lineage metadata is associated with the dataset by a user manually creating the lineage metadata based on information the user has about the dataset, and the user manually associating the lineage metadata with the dataset. 11. The method of claim 6 , wherein the lineage metadata is associated with the dataset by a system automatically parsing logged operations on data in the dataset. 12. The method of claim 6 , wherein the lineage metadata is associated with the dataset by a system searching database repositories to determine the ultimate source of the dataset. 13. The method of claim 6 , wherein the lineage metadata is associated with the dataset and updated by a central governance entity that has API's allowing other entities to manage lineage metadata. 14. The method of claim 6 , wherein the lineage metadata is configured to be tracked on a database level, table level, row level, and cell level. 15. In a network computing environment comprising a plurality of interconnected systems where data is shared between the systems, a system for tracking lineage of data, the system comprising one or more processors having access to computer executable instructions that, when executed by the one or more processors, enable the processors to: access a dataset, the dataset having been signed by a first authority to ensure that the dataset has not been compromised; at a first entity, perform an operation on the dataset; access lineage metadata associated with the dataset, the lineage metadata comprising data indicating the original source of the data and information about one or more operations which have been performed on the dataset, the information for each of the one or more operations including when the each operation was performed, an identity of an entity which performed the each operation, and the nature of the each operation, wherein the lineage metadata is signed by a second authority using a cryptographic certificate which allows dataset users to determine whether the lineage metadata has been compromised and whether to trust the second authority; as a result of performing a first operation on the dataset, update the lineage metadata to indicate that the first entity performed the operation on the dataset such that the lineage metadata includes when the operation performed at the first entity was performed, information about the first entity, an indication that the operation performed at the first entity was performed at the first entity, and the nature of the operation performed at the first entity; computing a value which another entity can use to determine validity of the dataset and the lineage metadata; manage the lineage metadata at a central governance entity to allow the lineage metadata to enter and leave various repositories while still providing consistent management of the lineage metadata. 16. The system of claim 15 , further comprising providing functionality for determining if the lineage metadata has been compromised including performing a checksum on the dataset and the lineage metadata. 17. The system of claim 15 , further comprising providing function

Assignees

Inventors

Classifications

  • G06F16/219Primary

    Managing data history or versioning (querying versioned data G06F16/2474; querying temporal data G06F16/2477) · CPC title

  • Ensuring data consistency and integrity · CPC title

  • Change logging, detection, and notification (replication G06F16/27) · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10089335B2 cover?
Tracking lineage of data. A method may be practiced in a network computing environment including a plurality of interconnected systems where data is shared between the systems. A method includes accessing a dataset. The dataset is associated with lineage metadata. The lineage metadata includes data indicating the original source of the data, one or more intermediary entities that have performed…
Who is the assignee on this patent?
Liensberger Christian, Bouw Rene J, Kashi Ori, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F16/219. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 02 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).