System and method for rapid fault detection and repair in a shared nothing distributed database
US-2022114192-A1 · Apr 14, 2022 · US
US11550771B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11550771-B2 |
| Application number | US-202017070277-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 14, 2020 |
| Priority date | Oct 14, 2020 |
| Publication date | Jan 10, 2023 |
| Grant date | Jan 10, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A shared-nothing database system is provided in which parallelism and workload balancing are increased by assigning the rows of each table to “slices”, and storing multiple copies (“duplicas”) of each slice across the persistent storage of multiple nodes of the shared-nothing database system. When the data for a table is distributed among the nodes of a shared-nothing system in this manner, requests to read data from a particular row of the table may be handled by any node that stores a duplica of the slice to which the row is assigned. For each slice, a single duplica of the slice is designated as the “primary duplica”. All DML operations (e.g. inserts, deletes, updates, etc.) that target a particular row of the table are performed by the node that has the primary duplica of the slice to which the particular row is assigned. The changes made by the DML operations are then propagated from the primary duplica to the other duplicas (“secondary duplicas”) of the same slice.
Opening claim text (preview).
What is claimed is: 1. A method comprising: assigning rows of a table to a plurality of slices; wherein each row of the table is assigned to a single slice of the plurality of slices; for each slice of the plurality of slices, storing a plurality of duplicas; wherein each duplica of each slice contains rows of the table that belong to the slice; wherein, for each slice of the plurality of slices, storing the plurality of duplicas includes: storing a primary duplica in one persistent storage of a plurality of persistent storages; and storing one or more secondary duplicas in one or more persistent storages of the plurality of persistent storages; wherein each persistent storage of the plurality of persistent storages is local to and directly accessible only by a single engine instance of a plurality of engine instances; wherein each engine instance of the plurality of engine instances includes logic for accessing rows stored in the table; receiving, at a particular engine instance of the plurality of engine instances, a Data Manipulation Language (DML) request that involves a particular row of the table; the particular engine instance responding to the DML request by: determining that the particular row belongs to a particular slice; determining that the particular slice has a primary duplica that is stored on a particular persistent storage of the plurality of persistent storages; if the particular persistent storage is local to the particular engine instance, the particular engine instance executing a DML operation requested by the DML request, and if the particular persistent storage is local to a different engine instance, of the plurality of engine instances, than the particular engine instance, then the particular engine instance causing the different engine instance to execute the DML operation requested by the DML request. 2. The method of claim 1 wherein each engine instance in a particular set of engine instances, of the plurality of engine instances, has access to a respective duplica of the particular slice, the method further comprising: receiving a read request to read data that belongs to the particular slice as of a particular snapshot time; selecting a target engine instance from among the engine instances in the particular set of engine instances; and causing the target engine instance to perform a read operation to service the read request. 3. The method of claim 2 wherein the target engine instance performs the read operation by reading a particular version of data from a secondary duplica of the particular slice. 4. The method of claim 2 wherein: the read request is to read the particular row; the particular snapshot time is prior to a time associated with the DML operation; and the read operation involves obtaining a version of the particular row that does not reflect any changes made by the DML operation. 5. The method of claim 1 wherein: the DML request is part of a transaction that requires multiple DML operations, and the multiple DML operations are performed atomically such that processes either see all changes specified by the transaction or no changes specified by the transaction. 6. The method of claim 1 wherein the one or more persistent storages used to store the one or more secondary duplicas do not include the one persistent storage that stores the primary duplica. 7. The method of claim 1 wherein: the particular persistent storage is local to the particular engine instance; the DML request is from a particular client; and the method further comprises: the particular engine instance initiating a propagation operation to propagate a change made by the DML operation to one or more secondary duplicas stored in persistent storages that are local to one or more other engine instances; and after initiating the propagation operation and before receiving any acknowledgement that the change was received by the one or more other engine instances, the particular engine instance communicating to the particular client that the change has been made. 8. The method of claim 1 wherein: the DML operation is an update operation that makes a change to the particular row; and the particular persistent storage includes an existing version of the particular row that does not include the change; and executing the update operation includes creating a new version of the particular row that includes the change while retaining the existing version of the particular row that does not include the change. 9. A method comprising: assigning rows of a table to a plurality of slices; wherein each row of the table is assigned to a single slice of the plurality of slices; for each slice of the plurality of slices, storing at least one copy of the slice on persistent storage; wherein each copy of each slice contains rows of the table that belong to the slice; wherein the plurality of slices includes a particular slice that includes a particular set of rows of the table; receiving a request to read a particular row that belongs to the particular set of rows as of a particular snapshot time; responding to the request by: locating a tail of a particular chronological entry chain that corresponds to the particular row; wherein the particular chronological entry chain: is stored in a particular copy of the particular slice; and includes a plurality of entries for the particular row; wherein the plurality of entries are linked in chronological order based on times at which updates reflected in the respective entries were made to the row; based on times at which updates reflected in the entries were made to the particular row, determining which values from the plurality of entries belong to the particular snapshot time; and returning one or more values, obtained from the particular chronological entry chain, that belong to a snapshot defined by the particular snapshot time. 10. The method of claim 9 wherein locating the tail of the particular chronological entry chain includes: applying a hash function to a primary key associated with the particular row to produce a hash value; based on the hash value, locating a bucket within a hash table; using information in the bucket to locate a first pointer; and following the first pointer to an entry at the tail of the particular chronological entry chain. 11. The method of claim 9 wherein: the particular copy of the particular slice includes a delta log and a row heap; the particular chronological entry chain includes no delta log entries in the delta log and a set of one or more row heap entries in the row heap. 12. The method of claim 9 wherein: the particular copy of the particular slice includes a delta log and a row heap; the particular chronological entry chain includes a set of one or more delta log entries in the delta log and a set of one or more row heap entries in the row heap; and the set of one or more delta log entries reflect changes that are more recent than changes reflected in the set of one or more row heap entries. 13. The method of claim 12 wherein: each entry in the particular chronological entry chain is associated with either a respective timestamp or an indication that the changes contained therein were made by a transaction that has not committed; and determining which values from the plurality of entries belong to the particular snapshot time includes: skipping values from entries that include the indication; and returning a most recent version of values from entries whose respective timestamps are at least as old as the particular snapshot time. 14. The method of claim
Database tuning (G06F16/2282 takes precedence; database performance monitoring G06F11/3409) · CPC title
Hash tables · CPC title
Data partitioning, e.g. horizontal or vertical partitioning · CPC title
using timestamps · CPC title
Details of file system snapshots on the file-level, e.g. snapshot creation, administration, deletion (error detection or correction of the data by redundancy in operations or in hardware G06F11/14, G06F11/16) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.