Unsupervised round robin catch up algorithm

US10567499B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10567499-B1
Application numberUS-201514957477-A
CountryUS
Kind codeB1
Filing dateDec 2, 2015
Priority dateDec 2, 2015
Publication dateFeb 18, 2020
Grant dateFeb 18, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Data replication groups may be used to store data in a distributed computing environment. The data replication groups may include a set of nodes executing a consensus protocol to maintain data durably. During the execution of the set of nodes various nodes may become stale or otherwise obtain a state that is inconsistent with at least one other node of the data replication group. A catch up algorithm may be employed in which a set of teachers is initialized, the various node which may be stale may select a teacher from the set of teachers and perform learning operations. This process may be repeated until the state of the various nodes is current with at least one other node of the data replication group.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory computer-readable storage medium having stored thereon executable instructions that, if executed by one or more processors of a computer system, cause the computer system to: initialize a set of nodes of a data replication group; determine that one or more nodes of the set of nodes are stale; initialize a set of teachers with information associated with the set of nodes that are current; for each node of the determined one or more nodes, selecting a teacher from the set of teachers corresponding to a particular node of the set of nodes; and until determining that the one or more nodes of the set of nodes are current, iteratively perform a learning operation to obtain state information from the set of teachers. 2. The non-transitory computer-readable storage medium of claim 1 , wherein the executable instructions that cause the computer system to perform the learning operation further include instructions that cause the computer system to copy a state of the particular node of the data replication group associated with the teacher. 3. The non-transitory computer-readable storage medium of claim 2 , wherein the executable instructions that cause the computer system to copy the state of the particular node further include instructions that cause the computer system to copy the state by at least obtaining a snapshot of the particular node and copying the snapshot to a memory. 4. The non-transitory computer-readable storage medium of claim 1 , wherein the executable instructions that cause the computer system to perform the learning operation further include instructions that cause the computer system to obtain a log from a particular node of the data replication group associated with the teacher, where the log indicates operations performed by the data replication group. 5. The non-transitory computer-readable storage medium of claim 1 , wherein the executable instructions further comprise instructions that, if executed by the one or more processors, cause the computer system to determine one or more additional nodes of the set of nodes are stale and for each node of the determined one or more additional nodes selecting a second teacher from the set of teachers and performing a second learning operation. 6. The non-transitory computer-readable storage medium of claim 1 , wherein the executable instructions further comprise instructions that, if executed by the one or more processors, cause the computer system to remove the teacher from the set of teachers. 7. The non-transitory computer-readable storage medium of claim 1 , wherein the executable instructions that cause the computer system to initialize the set of teachers further include instructions that cause the computer system to initialize the set of teachers with information corresponding to a second set of nodes of the data replication group, where the second set of nodes includes at least one node that is not a member of the set of nodes and was added to the data replication group after initialization. 8. The non-transitory computer-readable storage medium of claim 1 , wherein the executable instructions further comprise instructions that, if executed by the one or more processors, cause the computer system to: add a new node to the data replication group; select a new teacher from the set of teachers; and perform a new learning operation with the new teacher and the new node. 9. A system, comprising: one or more processors; and memory that includes instructions that, if executed by the one or more processors, cause the system to: detect that a state of the system is stale based at least in part on information obtained from at least one node of a set of nodes of a data replication group; initialize a set of teachers corresponding to a set of other nodes of the data replication group, where the set of teachers includes information that enables the system to obtain state information from the set of teachers; and until determining that the state of the system is current, iteratively perform a non-redundant catch up algorithm to update the state of the system by at least: selecting a first teacher from the set of teachers corresponding to a particular node of the set of nodes; and obtaining state information from the first teacher to update the state of the system. 10. The system of claim 9 , wherein the memory further includes instructions that, if executed by the one or more processors, cause the system to cause the data replication group to implement a consensus protocol. 11. The system of claim 10 , wherein the consensus protocol is a Paxos consensus protocol. 12. The system of claim 9 , wherein the memory further includes instructions that, if executed by the one or more processors, cause the system to remove the first teacher from the set of teachers after obtaining the state information from the first teacher. 13. The system of claim 9 , wherein detecting that the state of the system is stale further comprises determining that a heartbeat message from the at least one other node of the set of nodes of the data replication group has not been received after an expiration of an interval of time. 14. The system of claim 9 , wherein initializing the set of teachers further comprises initializing the set of teachers to an initial set of nodes of the data replication group, where the initial set of nodes corresponds to nodes of the data replication group included in the data replication group during generation of the data replication group. 15. The system of claim 9 , wherein initializing the set of teachers further comprises initializing the set of teachers to a current set of nodes of the data replication group, where the current set of nodes correspond to nodes of the data replication group included in the data replication group at a point in time of initializing the set of teachers. 16. A computer-implemented method, comprising: initializing a data replication group with a plurality of nodes implementing a consensus protocol; detecting that a state of a node of the plurality of nodes has yet to advance to a state of at least one other node of the plurality of nodes of the data replication group; initializing, by the node, a set of teachers, wherein the set of teachers includes information that enable the node to obtain information from the set of teachers; and until determining that the state of the node has advanced to the state of the at least one other node of the plurality of nodes of the data replication group, iteratively performing, by the node, a non-redundant catch up algorithm to update the state of the node comprising: selecting a teacher from the set of teachers, the teacher corresponding to a particular node of the plurality of nodes; obtaining update information from the teacher; and updating the state of the node in accordance with the update information. 17. The computer-implemented method of claim 16 , wherein performing the non-redundant catch up algorithm to update the state of the node further comprises: detecting that the state of the node has yet to advance to the state of the at least one other node of the plurality of nodes of the data replication group; removing the teacher from the set of teachers such that the particular node is no longer included in the set of teachers; selecting a second teacher from the set of teachers; obtaining second update information from the second teacher; and updating the state of the node in accordance with the second update information. 18. The computer-implemented method of c

Assignees

Inventors

Classifications

  • for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title

  • Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes · CPC title

  • Interfaces specially adapted for storage systems · CPC title

  • in relation to data integrity, e.g. data losses, bit errors · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10567499B1 cover?
Data replication groups may be used to store data in a distributed computing environment. The data replication groups may include a set of nodes executing a consensus protocol to maintain data durably. During the execution of the set of nodes various nodes may become stale or otherwise obtain a state that is inconsistent with at least one other node of the data replication group. A catch up alg…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification H04L67/1095. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Feb 18 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).