Systems and methods for traceability of data changes
US-2024289310-A1 · Aug 29, 2024 · US
US2025272213A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025272213-A1 |
| Application number | US-202418584687-A |
| Country | US |
| Kind code | A1 |
| Filing date | Feb 22, 2024 |
| Priority date | Feb 22, 2024 |
| Publication date | Aug 28, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method and system for tracing data streamed across differing different system platforms are disclosed. The method includes providing and storing context data corresponding to a data event published to a streaming service, extracting a data classifier block from the stored context data, and extracting a lineage tracer block from the stored context data. The method further includes converting the lineage tracer block into a linked lineage triple, and generating a lineage graph using the linked lineage triple for visualization.
Opening claim text (preview).
What is claimed is: 1 . A method for tracing data streamed across differing different system platforms, the method comprising: providing, by each of a plurality of devices and to a database, context data corresponding to a data event published to a streaming service including a streaming pipeline; storing, in the database, the context data provided by each of the plurality of devices; reading, by a data classifier, the context data stored in the database and extracting a data classifier block from the stored context data; gathering, by the data classifier and from the extracted data classifier block, personal identification information and sensitive data elements; reading, by a lineage processor, the context data stored in the database and extracting a lineage tracer block from the stored context data; converting, by the lineage processor, the lineage tracer block into a linked lineage triple; processing, by the lineage processor, the linked lineage triple by tokenizing and deduplicating the linked lineage triple; and generating, by the data trace builder, a lineage graph using the tokenized and deduplicated linked lineage triple for visualization. 2 . The method according to claim 1 , wherein the lineage tracer block includes one or more of an origin data object, a transform data object and a destination data object. 3 . The method according to claim 2 , wherein the origin data object includes information related to an entity being sourced. 4 . The method according to claim 2 , wherein the destination data object includes information related to the data event being published. 5 . The method according to claim 2 , wherein the transform data object includes one or more transformations that occurred. 6 . The method according to claim 5 , wherein at least one of the one or more transformations is performed offline from the streaming pipeline. 7 . The method according to claim 5 , wherein the one or more transformations include a transformation at an entity level or a transformation at a column level. 8 . The method according to claim 1 , wherein the plurality of devices includes a data publisher device that is configured as a dedicated data publisher. 9 . The method according to claim 1 , wherein the plurality of devices includes a data publisher device that is configured to jointly operate as a data publisher and a data consumer. 10 . The method according to claim 1 , wherein the plurality of devices includes a data consumer device that is configured as a dedicated data consumer. 11 . The method according to claim 1 , wherein the lineage tracer block includes a mode type. 12 . The method according to claim 11 , wherein the mode type includes one of a streaming type and a batch type. 13 . The method according to claim 11 , wherein the lineage tracer block further includes a mode sub-type. 14 . The method according to claim 13 , wherein the sub-mode type includes one of a system of record and derived. 15 . The method according to claim 1 , wherein the linked lineage triple includes at least two nodes and an edge that connects the at least two nodes. 16 . The method according to claim 1 , further comprising: deriving at least one insight specific to a node by applying a graphic machine learning algorithm on the lineage graph. 17 . The method according to claim 1 , wherein the lineage tracer block is a JSON object qualified with prove ontology. 18 . The method according to claim 5 , wherein at least one the one or more transformations is determined based on property attributes on nodes present in the lineage graph. 19 . A system for tracing data streamed across differing different system platforms, the system comprising: a memory; and a processor, wherein the system is configured to perform: providing, by each of a plurality of devices and to a database, context data corresponding to a data event published to a streaming service including a streaming pipeline; storing, in the database, the context data provided by each of the plurality of devices; reading, by a data classifier, the context data stored in the database and extracting a data classifier block from the stored context data; gathering, by the data classifier and from the extracted data classifier block, personal identification information and sensitive data elements; reading, by a lineage processor, the context data stored in the database and extracting a lineage tracer block from the stored context data; converting, by the lineage processor, the lineage tracer block into a linked lineage triple; processing, by the lineage processor, the linked lineage triple by tokenizing and deduplicating the linked lineage triple; and generating, by the data trace builder, a lineage graph using the tokenized and deduplicated linked lineage triple for visualization. 20 . A non-transitory computer readable storage medium that stores a computer program for tracing data streamed across differing different system platforms, the computer program, when executed by a processor, causing a system to perform a plurality of processes comprising: providing, by each of a plurality of devices and to a database, context data corresponding to a data event published to a streaming service including a streaming pipeline; storing, in the database, the context data provided by each of the plurality of devices; reading, by a data classifier, the context data stored in the database and extracting a data classifier block from the stored context data; gathering, by the data classifier and from the extracted data classifier block, personal identification information and sensitive data elements; reading, by a lineage processor, the context data stored in the database and extracting a lineage tracer block from the stored context data; converting, by the lineage processor, the lineage tracer block into a linked lineage triple; processing, by the lineage processor, the linked lineage triple by tokenizing and deduplicating the linked lineage triple; and generating, by the data trace builder, a lineage graph using the tokenized and deduplicated linked lineage triple for visualization.
Data logging (G06F11/14, G06F11/2205 take precedence) · CPC title
Visualisation of programs or trace data · CPC title
where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems (multiprogramming arrangements G06F9/46; allocation of resources G06F9/50) · CPC title
Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation {; Recording or statistical evaluation of user activity, e.g. usability assessment} · CPC title
with visual {or acoustical} indication of the functioning of the machine · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.