Database system for triggering event notifications based on updates to database records
US-2024419652-A1 · Dec 19, 2024 · US
US9659042B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9659042-B2 |
| Application number | US-201213494449-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 12, 2012 |
| Priority date | Jun 12, 2012 |
| Publication date | May 23, 2017 |
| Grant date | May 23, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A data lineage tracking system may include a memory storing a module comprising machine readable instructions to obtain trace log entries representing an interaction with, a manipulation of, and/or a creation of a data value. The data lineage tracking system may further include machine readable instructions to select the trace log entries that are associated with commands performed by an application, cluster similar trace log entries from the selected trace log entries, and analyze mappings between the clustered trace log entries to determine data lineage flow associated with the data value.
Opening claim text (preview).
What is claimed is: 1. A data lineage tracking system comprising: a processor; and a memory storing machine readable instructions that when executed by the processor cause the processor to: obtain trace log entries representing at least one of an interaction with a data value, a manipulation of the data value, and a creation of the data value; select, from the obtained trace log entries, trace log entries that are associated with commands performed by an application; cluster similar trace log entries from the selected trace log entries; measure variability of temporal differences between the trace log entries in cluster pairs by calculating entropy of the temporal differences to determine a consistency of the temporal differences, wherein the entropy represents a measure of uncertainty associated with the temporal differences, a relatively high entropy score represents a high variation in the temporal differences, and a relatively low entropy score represents a low variation in the temporal differences; map a command-timestamp pair, (s 1 , t 1 ), for a cluster c 1 to another command-timestamp pair, (s 2 , t 2 ), for a cluster c 2 , when there does not exist a s 1 ′ in cluster c 1 such that |t 1 ′−t 2 |<|t 1 −t 2 |, and there does not exist a s 2 ′ in cluster c 2 such that |t 1 ′−t 2 |<|t 1 −t 2 |, wherein the s 1 is a trace log entry command from the cluster c 1 and the t 1 is a timestamp for the trace log entry command s 1 , the s 1 ′ is a trace log entry command from the cluster c 1 and the t 1 ′ is a timestamp for the trace log entry command s 1 ′, the s 2 is a trace log entry command from the cluster c 2 and the t 2 is a timestamp for the trace log entry command s 2 , and the s 2 ′ is a trace log entry command from the cluster c 2 ; analyze the mappings between the clustered trace log entries to determine data lineage flow associated with the data value by identifying each cluster of a plurality of clusters for which an entropy falls below a predetermined entropy threshold, wherein entropies below the predetermined entropy threshold represent a low entropy, and constructing a cluster chain including clusters with the low entropies to generate the data lineage flow; determine data value lineage by determining a first command associated with at least one of an interaction with, a manipulation of, and a creation of the data value, determining a second command associated with at least one of an interaction with and a manipulation of the data value, and linking the second command to the first command; determine, based on the data value lineage associated with the data value, whether the data value is authentic; and in response to a determination that the data value is authentic, generate, based on the data value, a report with respect to different systems associated with the data value and the application. 2. The data lineage tracking system of claim 1 , wherein the similar trace log entries are clustered based on at least one of a command type, a table name, and an attribute name. 3. The data lineage tracking system of claim 1 , wherein the machine readable instructions to determine the data value lineage further comprise machine readable instructions that when executed by the processor further cause the processor to: link the second command to the first command by setting a reference value for the second command to a unique identification (ID) for the first command. 4. The data lineage tracking system of claim 1 , further comprising machine readable instructions that when executed by the processor further cause the processor to: determine a reason for a command of the commands based on an analysis of an asset, a resource and the application registered with the data lineage tracking system, wherein the reason for the command is based on a historical analysis of interactions with the asset, the resource and the application. 5. The data lineage tracking system of claim 1 , further comprising machine readable instructions that when executed by the processor further cause the processor to: identify an anomaly in the data value lineage based on a determination of whether a change in the data value exceeds a predetermined percentage. 6. The data lineage tracking system of claim 1 , further comprising machine readable instructions that when executed by the processor further cause the processor to: generate a graph illustrating the data lineage flow identifying at least one of an asset, a resource and the application that have interacted with the data value. 7. The data lineage tracking system of claim 1 , further comprising machine readable instructions that when executed by the processor further cause the processor to: receive calls from data sources, wherein the calls include structured query language (SQL) queries and NoSQL inserts and updates. 8. The data lineage tracking system of claim 1 , further comprising machine readable instructions that when executed by the processor further cause the processor to: poll data sources for structured query language (SQL) queries and NoSQL inserts and updates. 9. A data lineage tracking system comprising: a processor; and a memory storing machine readable instructions that when executed by the processor cause the processor to: obtain trace log entries representing at least one of an interaction with a data value, a manipulation of the data value, and a creation of the data value; select, from the obtained trace log entries, trace log entries that are associated with commands performed by an application; cluster similar trace log entries from the selected trace log entries; measure variability of temporal differences between the trace log entries in cluster pairs by calculating entropy of the temporal differences to determine a consistency of the temporal differences, wherein the entropy represents a measure of uncertainty associated with the temporal differences, a relatively high entropy score represents a high variation in the temporal differences, and a relatively low entropy score represents a low variation in the temporal differences; map a command-timestamp pair, (s 1 , t 1 ), for a cluster c 1 to another command-timestamp pair, (s 2 , t 2 ), for a cluster c 2 , when there does not exist a s 1 ′ in cluster c 1 such that |t 1 ′−t 2 |<|t 1 −t 2 |, and there does not exist a s 2 ′ in cluster c 2 such that |t 1 ′−t 2 |<|t 1 −t 2 |,wherein the s 1 is a trace log entry command from the cluster c 1 and the t 1 is a timestamp for the trace log entry command s 1 , the s 1 ′ is a trace log entry command from the cluster c 1 and the t 1 ′ is a timestamp for the trace log entry command s 1 ′, the s 2 is a trace log entry command from the cluster c 2 and the t 2 is a timestamp for the trace log entry command s 2 , and the s 2 ′ is a trace log entry command from the cluster c 2 ; analyze the mappings between the clustered trace log entries to determine data lineage flow associated with the data value by identifying each cluster of a plurality of clusters for which an entropy falls below a predetermined entropy threshold, wherein entropies below the predetermined entropy threshold represent a low entropy, and constructing a cluster chain including clusters with the low entropies to generate the data lineage flow; determine data value lineage by determining a first command associated with at least one of an interaction with, a manipulation of, and a creation of the data value, determining a second command associated with at least one of an interaction with and a manipulation of the data value, and linking the second command to the first command by setting a reference value for the second command to a unique identification (ID) for the
Managing data history or versioning (querying versioned data G06F16/2474; querying temporal data G06F16/2477) · CPC title
Clustering; Classification · CPC title
Change logging, detection, and notification (replication G06F16/27) · CPC title
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.