Locality based quorums

US11507480B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11507480-B2
Application numberUS-201816185423-A
CountryUS
Kind codeB2
Filing dateNov 9, 2018
Priority dateDec 14, 2010
Publication dateNov 22, 2022
Grant dateNov 22, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed are various embodiments for distributing data items within a plurality of nodes. A data item that is subject to a data item update request is updated from a master node to a plurality of slave notes. The update of the data item is determined to be locality-based durable based at least in part on acknowledgements received from the slave nodes. Upon detection that the master node has failed, a new master candidate is determined via an election among the plurality of slave nodes.

First claim

Opening claim text (preview).

Therefore, the following is claimed: 1. A method, comprising: replicating by a master node for data stored in a distributed data store, a write to the data received by the master node to a plurality of other nodes of the distributed data store that replicate the data, wherein individual ones of the plurality of other nodes are in different respective data centers, and wherein performance of replication of the write to the data is based at least in part on a number of the plurality of other nodes that acknowledge writing to the data at the different respective data centers; detecting, by at least one of the plurality of other nodes, that the master node has failed based at least in part on a number of the plurality of other nodes that have acknowledged writing to the data at the different respective data centers; determining, among the plurality of other nodes, which one of the plurality of other nodes is to transition to being a new master node for the data; and performing a second write to the data using the new master node. 2. The method of claim 1 , wherein the write to the data is received from a client of the distributed data store. 3. The method of claim 1 , further comprising: after replicating the write, confirming, by the master node, to a client of the distributed data store that the write to the data is durable. 4. The method of claim 1 , wherein determining, among the plurality of other nodes, which one of the plurality of other nodes is to transition to being the new master node for the data comprises: determining a new master candidate through an election among the plurality of other nodes in response to detecting that the master node has failed; and transitioning the new master candidate to become the new master node when a number of the plurality of other nodes that select the new master candidate meets a failover quorum based at least in part upon a respective location of the plurality of other nodes that select the new master candidate. 5. The method of claim 4 , wherein N is a total number of the different respective data centers, wherein K is a durability requirement, and wherein the method further comprises waiting, by the new master candidate, to perform recovery from the failover quorum, wherein the failover quorum is met when recovery is performed by the plurality of other nodes that reside within N-K+ 1 of the different respective data centers. 6. The method of claim 4 , wherein the election employs a Paxos election scheme. 7. The method of claim 1 , wherein detecting that the master node has failed is based on a message received from the master node at the at least one node. 8. A method, comprising: determining, via at least one of a plurality of nodes that replicate data stored in a distributed data store, a new master candidate through an election among the plurality of nodes in response to detecting, by at least one of the plurality of nodes, that a master node for the data has failed based at least in part on a number of the plurality of nodes that have acknowledged writing to the data subject to a prior write request replicated by the master node to the plurality of nodes, wherein individual ones of the plurality of nodes are in respective data centers; and transitioning, via at least one of the plurality of nodes, the new master candidate to become a new master node for the data after waiting to recover data from a number of the plurality of nodes that meets a failover quorum based at least in part on a respective location of the plurality of nodes from which the data is recovered. 9. The method of claim 8 , further comprising writing, via at least one of the plurality of nodes, to the data that is subject to the prior write request from the master node to the plurality of nodes. 10. The method of claim 8 , wherein the election employs a Paxos election scheme. 11. A system, comprising: a plurality of computing devices that implement a distributed data store comprising a master node for data stored in the distributed data store and a plurality of other nodes that replicate the data, wherein individual ones of the plurality of other nodes are in different respective data centers; wherein the distributed data store is configured to: replicate a write to the data received by the master node to the plurality of other nodes, wherein performance of replication of the write to the data is based on a number of the plurality of other nodes that acknowledge writing to the data at the different respective data centers; detect, by at least one of the plurality of other nodes, that the master node has failed based at least in part on a number of the plurality of other nodes that have acknowledged writing to the data subject to write request at the different respective data centers; determine, among the plurality of other nodes, which one of the plurality of other nodes is to transition to being a new master node for the data; and perform a second write to the data using the new master node. 12. The system of claim 11 , wherein to determine, among the plurality of other nodes, which one of the plurality of other nodes is to transition to being the new master node for the data, the distributed data store is configured to: determine a new master candidate through an election among the plurality of other nodes in response to detecting that the master node has failed; and transition the new master candidate to become the new master node when a number of the plurality of other nodes that select the new master candidate meets a failover quorum based at least in part upon a respective location of the plurality of other nodes that select the new master candidate. 13. The system of claim 12 , wherein the election employs a Paxos election scheme. 14. The system of claim 12 , wherein to replicate the write, the distributed data store is configured to: determine whether the write to the data is locality-based durable based at least in part on a respective location of the plurality of other nodes that have acknowledged replicating the write to the master node. 15. The system of claim 14 , wherein the write to the data is determined to be locality-based durable when at least a subset of the plurality of other nodes residing in K data centers out of N data centers have acknowledged that the data item has been updated, wherein N is a total number of the different data centers. 16. The system of claim 14 , wherein the master node is detected as failed after the write to the data is determined not to be locality-based durable. 17. The system of claim 14 , wherein the master node: receives the write to the data from a client of the distributed data store; and confirms to the client that the write to the data is durable in response to determining that the write request to the data is locality-based durable. 18. The system of claim 14 , wherein the master node sends an error code to a client of the distributed data store when the write to the data is not determined to be locality-based durable after a predefined amount of time. 19. The system of claim 14 , wherein N is a total number of the different respective data centers, wherein K is a durability requirement, and wherein the distributed data store is further configured to: wait, by the new master candidate, to perform recovery from the failover quorum, wherein the failover quorum is met when recovery is performed by the plurality of other nodes that reside within N−K+1 of the different respective data centers.

Assignees

Inventors

Classifications

  • using centralised failover control functionality · CPC title

  • with more than one idle spare processing component · CPC title

  • where the redundant components share persistent storage (G06F11/2043 takes precedence) · CPC title

  • Error detection; Error correction; Monitoring (error detection, correction or monitoring in information storage based on relative movement between record carrier and transducer G11B20/18; monitoring, i.e. supervising the progress of recording or reproducing G11B27/36; in static stores G11C29/00) · CPC title

  • eliminating a faulty processor or activating a spare · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11507480B2 cover?
Disclosed are various embodiments for distributing data items within a plurality of nodes. A data item that is subject to a data item update request is updated from a master node to a plurality of slave notes. The update of the data item is determined to be locality-based durable based at least in part on acknowledgements received from the slave nodes. Upon detection that the master node has fa…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F11/2025. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 22 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).