System and method for capture of change data from distributed data sources, for use with heterogeneous targets

US11762836B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11762836-B2
Application numberUS-201816145707-A
CountryUS
Kind codeB2
Filing dateSep 28, 2018
Priority dateSep 29, 2017
Publication dateSep 19, 2023
Grant dateSep 19, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In accordance with an embodiment, described herein is a system and method for capture of change data from a distributed data source system, for example a distributed database or a distributed data stream, and preparation of a canonical format output, for use with one or more heterogeneous targets, for example a database or message queue. The change data capture system can include support for features such as distributed source topology-awareness, initial load, deduplication, and recovery. A technical purpose of the systems and methods described herein includes determination and communication of changes performed to data at a distributed data source that includes a large amount of data across a plurality of nodes, to one or more target computer systems.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for capture of change data from a distributed data source, for use with heterogeneous targets, comprising: a computer that includes a processor, and operates to capture change data from a distributed data source comprising a plurality of nodes using a capture process, for use with one or more targets, wherein the nodes store data records as rows of data, and wherein changes to the rows of data are committed to source change trace entities; wherein each node of the plurality of nodes in the distributed data source is associated with a source change trace entity that records data changes that are processed at that node; wherein each data record is located within a partition within the nodes, and when extracted from the distributed data source said record is associated with a token indicative of a partition and node within the distributed data source providing that record; and wherein the computer: determines a distributed source topology associated with the plurality of nodes in the distributed data source, including monitoring for a presence of new nodes or unavailability of one or more of the nodes within the distributed data source, and accesses the source change trace entities at one or more nodes or replica nodes, to determine the data changes at the distributed data source, for use with the one or more targets, including: maintaining a cache of data records extracted from the distributed data source wherein the cache includes, for each record within the cache, the token indicative of the node within the distributed data source providing that record, and determining, based on comparison of the tokens in the cache, to capture a data change associated with a particular record, as provided by a particular node of the distributed data source, wherein a determination is made that the node indicated by the token in the cache matches the source node for the particular record then passing the particular record to the capture process; wherein if a source node is determined as being unavailable, a recovery process selects, from within the plurality of replica nodes at the distributed data source, based on the token indicative of partition within the distributed data source providing a record, a replica node from which to obtain and replay change data records; wherein if more than one replica node is associated with the record, a history queue that includes a set of last records read from one or more source nodes is used to select based on record history which replica node to provide the record. 2. The system of claim 1 , wherein the distributed data source is one of a distributed database, or a distributed data stream, or other distributed data source, and wherein the one or more targets include one or more of a database, message queue, or other target. 3. The system of claim 1 , wherein the computer performs a change data capture process that converts the change data read from the distributed data source, into a canonical format output of the change data, for consumption by the one or more targets. 4. The system of claim 3 , whereupon based on a target system to which the change data will be communicated, the canonical format output of the change data is converted to a format used by the target system. 5. The system of claim 3 , wherein the computer enables support for a new target system to be provided by a pluggable adapter component that reads the canonical format output of the change data and converts it to a format used by the new target system. 6. The system of claim 1 , wherein the computer performs a deduplication process that provides automatic deduplication of the data provided by the distributed data source, whereupon a change to the distributed source topology associated with the distributed data source system, including one or more nodes being added to or removed from the distributed source topology, the deduplication process detects the change to the distributed source topology. 7. The system of claim 1 , wherein the computer performs automatic discovery of the distributed source topology associated with the distributed data source system, and provides access to the source change trace entity at each node of the plurality of nodes of the distributed data source system. 8. The system of claim 1 , wherein the distributed data source includes nodes that provide records, and wherein the computer upon determining that a particular node in the distributed data source system providing the records becomes unavailable, performs a recovery process that selects a replica node at which to obtain the records. 9. The system of claim 1 , wherein the system includes a history queue that includes a set of last records read from one or more source nodes, wherein upon a node becoming unavailable, if the system determines there is more than one replica node with a matching last record, a replica with the maximum record history as determined by the history queue is selected to feed a partition token found in the last record processed by the unavailable node. 10. A method for capture of change data from a distributed data source, for use with heterogeneous targets, comprising: capturing change data from a distributed data source comprising a plurality of nodes, using a capture process, for use with one or more targets, wherein the nodes store data records as rows of data, and wherein changes to the rows of data are committed to source change trace entities; wherein each node of the plurality of nodes in the distributed data source is associated with a source change trace entity that records data changes that are processed at that node; wherein each data record is located within a partition within the nodes, and when extracted from the distributed data source said record is associated with a token indicative of a partition and node within the distributed data source providing that record; and wherein the method: determines a distributed source topology associated with the plurality of nodes in the distributed source, including monitoring for a presence of new nodes or unavailability of one or more of the nodes within the distributed data source, and accesses the source change trace entities at one or more nodes or replica nodes, to determine the data changes at the distributed data source, for use with the one or more targets, including: maintaining a cache of data records extracted from the distributed data source wherein the cache includes, for each record within the cache, the token indicative of the node within the distributed data source providing that record, and determining, based on comparison of the tokens in the cache, to capture a data change associated with a particular record, as provided by a particular node of the distributed data source, wherein a determination is made that the node indicated by the token in the cache matches the source node for the particular record then passing the particular record to the capture process; wherein if a source node is determined as being unavailable, selecting, from within the plurality of replica nodes at the distributed data source, based on the token indicative of partition within the distributed data source providing a record, a replica node from which to obtain and replay change data records; wherein if more than one replica node is associated with the record, a history queue that includes a set of last records read from one or more source nodes is used to select based on record history which replica node to provide the record. 11. The method of claim 10 , wherein the distributed data source is one of a distributed database, or a distributed data stream, or other distributed data source, and wherein the one or mo

Assignees

Inventors

Classifications

  • G06F16/27Primary

    Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title

  • Change logging, detection, and notification (replication G06F16/27) · CPC title

  • Ensuring data consistency and integrity · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11762836B2 cover?
In accordance with an embodiment, described herein is a system and method for capture of change data from a distributed data source system, for example a distributed database or a distributed data stream, and preparation of a canonical format output, for use with one or more heterogeneous targets, for example a database or message queue. The change data capture system can include support for fe…
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification G06F16/27. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 19 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).