System and method for an ultra highly available, high performance, persistent memory optimized, scale-out database

US11971869B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11971869-B2
Application numberUS-202217974152-A
CountryUS
Kind codeB2
Filing dateOct 26, 2022
Priority dateOct 14, 2020
Publication dateApr 30, 2024
Grant dateApr 30, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A shared-nothing database system is provided in which parallelism and workload balancing are increased by assigning the rows of each table to “slices”, and storing multiple copies (“duplicas”) of each slice across the persistent storage of multiple nodes of the shared-nothing database system. When the data for a table is distributed among the nodes of a shared-nothing system in this manner, requests to read data from a particular row of the table may be handled by any node that stores a duplica of the slice to which the row is assigned. For each slice, a single duplica of the slice is designated as the “primary duplica”. All DML operations (e.g. inserts, deletes, updates, etc.) that target a particular row of the table are performed by the node that has the primary duplica of the slice to which the particular row is assigned. The changes made by the DML operations are then propagated from the primary duplica to the other duplicas (“secondary duplicas”) of the same slice.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: assigning rows of a table to a plurality of slices; wherein each row of the table is assigned to a single slice of the plurality of slices; for each slice of the plurality of slices, storing a plurality of duplicas; wherein each duplica of each slice contains rows of the table that belong to the slice; wherein the plurality of slices includes a particular slice; wherein the particular slice has a corresponding plurality of duplicas; wherein each engine instance in a particular set of engine instances has access to a corresponding duplica of the corresponding plurality of duplicas; receiving a first read request to read a particular row that belongs to the particular slice as of a first snapshot time; wherein the particular row has been updated by a first Data Manipulation Language (DML) operation before the first read request was received; wherein the first snapshot time is prior to a time associated with the first DML operation; selecting a first engine instance from among the particular set of engine instances; and causing the first engine instance to perform a read operation to service the first read request by reading a first version of data from a first duplica of the particular slice; receiving a second read request to read the particular row as of a second snapshot time; wherein the particular row has been updated by a second DML operation before the second read request was received; wherein the second snapshot time is prior to a time associated with the second DML operation; selecting a second engine instance from among the particular set of engine instances; and causing the second engine instance to perform a read operation to service the second read request by reading a second version of data from a second duplica of the particular slice; wherein the first engine instance is different from the second engine instance and the first duplica is different from the second duplica. 2. The method of claim 1 wherein: the first DML operation was part of a DML request that was part of a transaction that requires multiple DML operations, and the multiple DML operations are performed atomically such that processes either see all changes specified by the transaction or no changes specified by the transaction. 3. The method of claim 1 wherein: the first version does not include one or more changes made by the first DML operation to the particular row; and the second version does not include one or more changes made by the second DML operation to the particular row. 4. A method comprising: assigning rows of a table to a plurality of slices; wherein each row of the table is assigned to a single slice of the plurality of slices; for each slice of the plurality of slices, storing a plurality of duplicas in one or more persistent storages of a plurality of persistent storages; wherein each duplica of each slice contains rows of the table that belong to the slice; receiving, from a particular client a Data Manipulation Language (DML) request that involves a particular row of the table; in response to the DML request, a particular engine instance causing an update operation requested by the DML request to be performed on a particular duplica of a particular slice to which the particular row belongs; the particular engine instance initiating a propagation operation to propagate a change made by the update operation to one or more duplicas of the particular slice that are stored in persistent storages that are local to one or more other engine instances; and after initiating the propagation operation and before receiving any acknowledgement that the change was received by the one or more other engine instances, the particular engine instance communicating to the particular client that the change has been made. 5. The method of claim 4 wherein: the particular duplica is on a particular persistent storage; the update operation makes a change to the particular row; and the particular persistent storage includes an existing version of the particular row that does not include the change; and executing the update operation includes creating a new version of the particular row that includes the change while retaining the existing version of the particular row that does not include the change. 6. A method comprising: receiving, from a transaction with a particular start time, a request to read a particular row that belongs to a particular set of rows as of a particular snapshot time that is earlier than the particular start time; responding to the request by: locating a tail of a particular chronological entry chain that corresponds to the particular row; wherein the particular chronological entry chain: is stored in a particular persistent storage; and includes a plurality of entries for the particular row; wherein the plurality of entries are linked in chronological order based on times at which updates reflected in the respective entries were made to the row; based on times at which updates reflected in the entries were made to the particular row, determining which values from the plurality of entries belong to the particular snapshot time; and returning one or more values, obtained from the particular chronological entry chain, that belong to a snapshot defined by the particular snapshot time. 7. The method of claim 6 wherein: returning one or more values comprises returning a plurality of values; and returning the plurality of values includes: returning a first value obtained from a first entry in the particular chronological entry chain; and returning a second value from a second entry in the particular chronological entry chain; wherein the first entry is different from the second entry. 8. The method of claim 7 wherein the first entry is associated with a different update time than the second entry. 9. The method of claim 7 wherein: the particular row has values for a particular set of columns; and the first entry has values for a subset of the particular set of columns. 10. The method of claim 6 wherein locating the tail of the particular chronological entry chain includes: applying a hash function to a primary key associated with the particular row to produce a hash value; based on the hash value, locating a bucket within a hash table; using information in the bucket to locate a first pointer; and following the first pointer to an entry at the tail of the particular chronological entry chain. 11. The method of claim 6 wherein: the particular persistent storage includes a delta log and a row heap; and the particular chronological entry chain includes no delta log entries in the delta log and a set of one or more row heap entries in the row heap. 12. The method of claim 6 wherein: the particular persistent storage includes a delta log and a row heap; the particular chronological entry chain includes a set of one or more delta log entries in the delta log and a set of one or more row heap entries in the row heap; and the set of one or more delta log entries reflect changes that are more recent than changes reflected in the set of one or more row heap entries. 13. The method of claim 6 wherein: each entry in the particular chronological entry chain is associated with either a respective timestamp or an indication that changes contained therein were made by a transaction that has not committed; and determining which values from the plurality of entries belong to the particular snapshot time includes: skipping values from entries that include the indication; and returning a most recent version of values from entries whose

Assignees

Inventors

Classifications

  • Hash tables · CPC title

  • Details of file system snapshots on the file-level, e.g. snapshot creation, administration, deletion (error detection or correction of the data by redundancy in operations or in hardware G06F11/14, G06F11/16) · CPC title

  • based on delta files · CPC title

  • Transactional file systems · CPC title

  • Database tuning (G06F16/2282 takes precedence; database performance monitoring G06F11/3409) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11971869B2 cover?
A shared-nothing database system is provided in which parallelism and workload balancing are increased by assigning the rows of each table to “slices”, and storing multiple copies (“duplicas”) of each slice across the persistent storage of multiple nodes of the shared-nothing database system. When the data for a table is distributed among the nodes of a shared-nothing system in this manner, req…
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification G06F16/2255. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 30 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).