System and method for rapid fault detection and repair in a shared nothing distributed database
US-2022114192-A1 · Apr 14, 2022 · US
US11599421B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11599421-B2 |
| Application number | US-202017137745-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 30, 2020 |
| Priority date | Oct 14, 2020 |
| Publication date | Mar 7, 2023 |
| Grant date | Mar 7, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A shared-nothing database system is provided in which parallelism and workload balancing are increased by assigning the rows of each table to “slices”, and storing multiple copies (“duplicas”) of each slice across the persistent storage of multiple nodes of the shared-nothing database system. When the data for a table is distributed among the nodes of a shared-nothing system in this manner, requests to read data from a particular row of the table may be handled by any node that stores a duplica of the slice to which the row is assigned. For each slice, a single duplica of the slice is designated as the “primary duplica”. All DML operations (e.g. inserts, deletes, updates, etc.) that target a particular row of the table are performed by the node that has the primary duplica of the slice to which the particular row is assigned. The changes made by the DML operations are then propagated from the primary duplica to the other duplicas (“secondary duplicas”) of the same slice.
Opening claim text (preview).
What is claimed is: 1. A method comprising: assigning rows of a table to a plurality of slices; wherein each row of the table is assigned to a single slice of the plurality of slices; for each slice of the plurality of slices, storing a plurality of duplicas; wherein each duplica of each slice contains rows of the table that belong to the slice; wherein, for each slice of the plurality of slices, storing the plurality of duplicas includes: storing a primary duplica in one persistent storage of a plurality of persistent storages; and storing one or more secondary duplicas in one or more persistent storages of the plurality of persistent storages; wherein each persistent storage of the plurality of persistent storages is local to and directly accessible only by a single engine instance of a plurality of engine instances; wherein each engine instance of the plurality of engine instances includes logic for accessing rows stored in the table; wherein the plurality of persistent storages includes a first persistent storage and a second persistent storage; wherein the first persistent storage stores the primary duplica for a particular slice that includes data for a particular row of the table; wherein the second persistent storage stores a particular secondary duplica for the particular slice; receiving a request to perform a transaction that affects data in the particular row of the table; initiating execution of the transaction at a first engine instance that is local to the first persistent storage; in response to the first engine instance ceasing to function prior to committing the transaction: establishing the particular secondary duplica on the second persistent storage as a new primary duplica for the particular slice; and causing the transaction to be resumed and committed by a second engine instance that is local to the second persistent storage. 2. The method of claim 1 wherein: the transaction is a multi-statement transaction; the request to perform the transaction includes a series of statement-level requests; and each statement-level request is a request to perform a corresponding statement of the multi-statement transaction. 3. The method of claim 1 wherein: the particular secondary duplica is one of a plurality of secondary duplicas of the particular slice; and establishing the particular secondary duplica as the new primary duplica is performed based on the particular secondary duplica having more transaction log records for the transaction than any other of the plurality of secondary duplicas of the particular slice. 4. The method of claim 3 further comprising, prior to the second engine instance resuming the transaction, transmitting to each secondary duplica, of the other of the plurality of secondary duplicas, any transaction log records that are missing at the secondary duplica. 5. The method of claim 1 further comprising: while the first engine instance is executing the transaction causing a particular change made to data in the primary duplica to be propagated to the particular secondary duplica; wherein causing the transaction to be resumed by the second engine instance is performed without the second engine instance repeating execution of the particular change. 6. The method of claim 5 wherein: the request to perform the transaction is received from a client; the method further comprises: after initiating propagation of the particular change to the particular secondary duplica, the first engine instance communicates to the client that the particular change was successfully executed; the client stores transaction status data that indicates that the particular change was successfully executed; causing the transaction to be resumed by the second engine instance includes the client sending to the second engine instance requests to only perform portions of the transaction that the first engine instance did not report as successfully executed. 7. The method of claim 5 wherein: the request to perform the transaction is received from a client; the method further comprises: after initiating propagation of the particular change to the secondary duplica, the first engine instance communicates to the client that the particular change was successfully executed; the client stores transaction status data that indicates that the particular change was successfully executed; after the first engine instance ceases to function, the client sends the transaction status data to the second engine instance; and based on the transaction status data, the second engine instance does not repeat the particular change. 8. The method of claim 7 wherein the transaction status data sent from the client to the second engine instance identifies a highest statement number that was confirmed-executed by the first engine instance. 9. The method of claim 8 wherein causing the particular change to be propagated to the particular secondary duplica includes sending the particular change to a Network Interface Card (NIC) that is local to the first engine instance and, in response to the NIC successfully putting a message containing the change on transmission media, communicating to the client that the particular change was propagated to the particular secondary duplica without waiting for acknowledgement that the particular change was successfully propagated to the particular secondary duplica. 10. The method of claim 9 further comprising, prior to the second engine instance resuming the transaction: determining whether the particular secondary duplica has all log records of the transaction up to and including log records for the highest statement number that was confirmed-executed by the first engine instance; and if the particular secondary duplica does not have all log records of the transaction up to and including log records for the highest statement number that was confirmed-executed by the first engine instance, then prior to the second engine instance resuming the transaction, storing in the particular secondary duplica any missing log records of the transaction up to and including log records for the highest statement number that was confirmed-executed by the first engine instance. 11. The method of claim 9 wherein the NIC is connected to a host that is executing the second engine instance by at least two distinct networks. 12. A method for resuming a transaction, comprising: using a first engine instance to coordinate execution of the transaction; wherein the transaction involves making updates at a plurality of participants; after all statements in the transaction have been executed, the first engine instance sending, directly or indirectly, prepare request messages to each participant of a plurality participants in the transaction; upon receiving prepare acknowledgement messages, directly or indirectly, from each participant of the plurality of participants, the first engine instance selecting a candidate commit time; initiating transmission of the candidate commit time to one or more backup coordinators; wherein the one or more backup coordinators include a second engine instance; in response to the first engine instance failing after selecting a candidate commit time and before the second engine instance has received the candidate commit time, establishing the second engine instance as a new transaction coordinator for the transaction; the second engine instance resuming the transaction by sending, directly or indirectly, prepare messages to each participant of the plurality of participants; upon receiving prepare acknowledgement messages, directly or indirectly from each participant of the plurality of partic
Data partitioning, e.g. horizontal or vertical partitioning · CPC title
Column-oriented storage; Management thereof · CPC title
Ensuring data consistency and integrity · CPC title
Using snapshots, i.e. a logical point-in-time copy of the data · CPC title
Management of the backup or restore process · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.