Replication of data objects from a source server to a target server
US-9910904-B2 · Mar 6, 2018 · US
US11762836B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11762836-B2 |
| Application number | US-201816145707-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 28, 2018 |
| Priority date | Sep 29, 2017 |
| Publication date | Sep 19, 2023 |
| Grant date | Sep 19, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In accordance with an embodiment, described herein is a system and method for capture of change data from a distributed data source system, for example a distributed database or a distributed data stream, and preparation of a canonical format output, for use with one or more heterogeneous targets, for example a database or message queue. The change data capture system can include support for features such as distributed source topology-awareness, initial load, deduplication, and recovery. A technical purpose of the systems and methods described herein includes determination and communication of changes performed to data at a distributed data source that includes a large amount of data across a plurality of nodes, to one or more target computer systems.
Opening claim text (preview).
What is claimed is: 1. A system for capture of change data from a distributed data source, for use with heterogeneous targets, comprising: a computer that includes a processor, and operates to capture change data from a distributed data source comprising a plurality of nodes using a capture process, for use with one or more targets, wherein the nodes store data records as rows of data, and wherein changes to the rows of data are committed to source change trace entities; wherein each node of the plurality of nodes in the distributed data source is associated with a source change trace entity that records data changes that are processed at that node; wherein each data record is located within a partition within the nodes, and when extracted from the distributed data source said record is associated with a token indicative of a partition and node within the distributed data source providing that record; and wherein the computer: determines a distributed source topology associated with the plurality of nodes in the distributed data source, including monitoring for a presence of new nodes or unavailability of one or more of the nodes within the distributed data source, and accesses the source change trace entities at one or more nodes or replica nodes, to determine the data changes at the distributed data source, for use with the one or more targets, including: maintaining a cache of data records extracted from the distributed data source wherein the cache includes, for each record within the cache, the token indicative of the node within the distributed data source providing that record, and determining, based on comparison of the tokens in the cache, to capture a data change associated with a particular record, as provided by a particular node of the distributed data source, wherein a determination is made that the node indicated by the token in the cache matches the source node for the particular record then passing the particular record to the capture process; wherein if a source node is determined as being unavailable, a recovery process selects, from within the plurality of replica nodes at the distributed data source, based on the token indicative of partition within the distributed data source providing a record, a replica node from which to obtain and replay change data records; wherein if more than one replica node is associated with the record, a history queue that includes a set of last records read from one or more source nodes is used to select based on record history which replica node to provide the record. 2. The system of claim 1 , wherein the distributed data source is one of a distributed database, or a distributed data stream, or other distributed data source, and wherein the one or more targets include one or more of a database, message queue, or other target. 3. The system of claim 1 , wherein the computer performs a change data capture process that converts the change data read from the distributed data source, into a canonical format output of the change data, for consumption by the one or more targets. 4. The system of claim 3 , whereupon based on a target system to which the change data will be communicated, the canonical format output of the change data is converted to a format used by the target system. 5. The system of claim 3 , wherein the computer enables support for a new target system to be provided by a pluggable adapter component that reads the canonical format output of the change data and converts it to a format used by the new target system. 6. The system of claim 1 , wherein the computer performs a deduplication process that provides automatic deduplication of the data provided by the distributed data source, whereupon a change to the distributed source topology associated with the distributed data source system, including one or more nodes being added to or removed from the distributed source topology, the deduplication process detects the change to the distributed source topology. 7. The system of claim 1 , wherein the computer performs automatic discovery of the distributed source topology associated with the distributed data source system, and provides access to the source change trace entity at each node of the plurality of nodes of the distributed data source system. 8. The system of claim 1 , wherein the distributed data source includes nodes that provide records, and wherein the computer upon determining that a particular node in the distributed data source system providing the records becomes unavailable, performs a recovery process that selects a replica node at which to obtain the records. 9. The system of claim 1 , wherein the system includes a history queue that includes a set of last records read from one or more source nodes, wherein upon a node becoming unavailable, if the system determines there is more than one replica node with a matching last record, a replica with the maximum record history as determined by the history queue is selected to feed a partition token found in the last record processed by the unavailable node. 10. A method for capture of change data from a distributed data source, for use with heterogeneous targets, comprising: capturing change data from a distributed data source comprising a plurality of nodes, using a capture process, for use with one or more targets, wherein the nodes store data records as rows of data, and wherein changes to the rows of data are committed to source change trace entities; wherein each node of the plurality of nodes in the distributed data source is associated with a source change trace entity that records data changes that are processed at that node; wherein each data record is located within a partition within the nodes, and when extracted from the distributed data source said record is associated with a token indicative of a partition and node within the distributed data source providing that record; and wherein the method: determines a distributed source topology associated with the plurality of nodes in the distributed source, including monitoring for a presence of new nodes or unavailability of one or more of the nodes within the distributed data source, and accesses the source change trace entities at one or more nodes or replica nodes, to determine the data changes at the distributed data source, for use with the one or more targets, including: maintaining a cache of data records extracted from the distributed data source wherein the cache includes, for each record within the cache, the token indicative of the node within the distributed data source providing that record, and determining, based on comparison of the tokens in the cache, to capture a data change associated with a particular record, as provided by a particular node of the distributed data source, wherein a determination is made that the node indicated by the token in the cache matches the source node for the particular record then passing the particular record to the capture process; wherein if a source node is determined as being unavailable, selecting, from within the plurality of replica nodes at the distributed data source, based on the token indicative of partition within the distributed data source providing a record, a replica node from which to obtain and replay change data records; wherein if more than one replica node is associated with the record, a history queue that includes a set of last records read from one or more source nodes is used to select based on record history which replica node to provide the record. 11. The method of claim 10 , wherein the distributed data source is one of a distributed database, or a distributed data stream, or other distributed data source, and wherein the one or mo
Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title
Change logging, detection, and notification (replication G06F16/27) · CPC title
Ensuring data consistency and integrity · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.