Quorum-based scalable database system

US2025036654A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025036654-A1
Application numberUS-202418779287-A
CountryUS
Kind codeA1
Filing dateJul 22, 2024
Priority dateJul 26, 2023
Publication dateJan 30, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are disclosed relating to a database system. The database system includes multiple coordinator nodes storing replicas of a partition. Each partition describes the state of locks and transactions for keys covered by that partition of keys. Each partition is, in turn, replicated. The multiple coordinator nodes receive, from multiple worker nodes, requests to grant a lock for a key to permit a worker node to write a record for the key as part of executing a transaction. A given coordinator node of the multiple coordinator nodes sends an approval response for the lock to at most one of the worker nodes. A single worker node acquires the lock in response to receiving approval responses from a majority of the multiple coordinator nodes, and none of the multiple worker nodes acquire the lock in response to none of them receiving approval responses from a majority of the multiple coordinator nodes.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method, comprising: storing, by multiple coordinator nodes of a database system, replicas of a particular one of a plurality of partitions partitioned by key range, wherein a given one of the replicas includes information about granted locks and records produced by a set of a plurality of worker nodes operable to perform database transactions, wherein the multiple coordinator nodes are operable to ensure transactional consistency for database transactions; receiving, by the multiple coordinator nodes from multiple worker nodes, requests to grant a lock for a key to permit a worker node to write a record for the key as part of executing a database transaction; and sending, by a given one of the multiple coordinator nodes, an approval response for the lock to at most one of the multiple worker nodes, wherein a single one of the multiple worker nodes acquires the lock in response to receiving approval responses from a majority of the multiple coordinator nodes, and wherein none of the multiple worker nodes acquire the lock in response to none of the multiple worker nodes receiving approval responses from a majority of the multiple coordinator nodes. 2 . The method of claim 1 , further comprising: determining, by a first coordinator node of the multiple coordinator nodes, to relocate, to a second coordinator node of the database system, at least a portion of the first coordinator node's replica of the particular partition; and copying, by the first coordinator node, the at least a portion of the first coordinator node's replica to the second coordinator node, wherein a given one of the plurality of worker nodes is operable to, during the copying, issue requests to both the first and second coordinator nodes for permission to perform a specified action associated with the particular partition. 3 . The method of claim 2 , wherein the multiple coordinator nodes form a first quorum for the particular partition, and the second coordinator node and the multiple coordinator nodes without the first coordinator node form a second quorum for the particular partition, wherein, to perform the specified action, the given worker node has to acquire approval responses from a majority of coordinator nodes in at least one of the first and second quorums. 4 . The method of claim 2 , wherein the given worker node is operable to issue requests to the second coordinator node for locks and to store new uncommitted work for the particular partition in response to the first coordinator node determining to relocate the at least a portion of the first coordinator node's replica. 5 . The method of claim 1 , further comprising: determining, by a first coordinator node of the multiple coordinator nodes based on a set of characteristics of the first coordinator node's replica of the particular partition, to locally split the first coordinator node's replica of the particular partition into a plurality of replicas corresponding to a plurality of subpartitions representing a splitting of the particular partition; and in response to the determining, the first coordinator node splitting the first coordinator node's replica of the particular partition into the plurality of replicas. 6 . The method of claim 5 , further comprises: informing, by the first coordinator node, one or more of remaining ones of the multiple coordinator nodes about the splitting to cause the one or more coordinator nodes to split their replicas of the particular partition. 7 . The method of claim 5 , wherein the plurality of replicas include fewer replicas than a number of replicas into which a second coordinator node of the multiple coordinator nodes has split the second coordinator node's replica of the particular partition. 8 . The method of claim 1 , further comprising: determining, by a first coordinator node of the multiple coordinator nodes based on a set of characteristics of the first coordinator node's replica of the particular partition, to locally merge two or more replicas of adjacent partitions into a single replica corresponding to a single partition representing a merging of two or more partitions; and in response to the determining, the first coordinator node merging the two or more replicas of the adjacent partitions into the single replica corresponding to a single partition. 9 . The method of claim 8 , further comprising: after the merging, the first coordinator node splitting the single partition into a different number of partitions than a number of partitions of the adjacent partitions. 10 . The method of claim 8 , wherein key ranges of the two or more replicas are adjacent. 11 . A non-transitory computer-readable medium having program instructions stored thereon that are capable of causing a computer system to implement a coordinator node that perform operations comprising: storing a replica of a particular one of a plurality of partitions partitioned by key range, wherein the replica includes information about granted locks and records produced by a set of a plurality of worker nodes operable to perform database transactions for a database system, and wherein the coordinator node is one of multiple coordinator nodes are operable to ensure transactional consistency for database transactions; receiving, from multiple worker nodes, requests to grant a lock for a key to permit a worker node to write a record for the key as part of executing a database transaction; and sending an approval response for the lock to at most one of the multiple worker nodes, wherein a single one of the multiple worker nodes acquires the lock in response to receiving approval responses from a majority of the multiple coordinator nodes, and wherein none of the multiple worker nodes acquire the lock in response to none of the multiple worker nodes receiving approval responses from a majority of the multiple coordinator nodes. 12 . The non-transitory computer-readable medium of claim 11 , wherein the operations further comprise: determining to relocate, to a different coordinator node of the database system, at least a portion of the replica of the particular partition; and coping the at least a portion of the replica to the different coordinator node, wherein the coordinator node and the different coordinator node are associated with different quorums of coordinator nodes, and wherein a given one of the plurality of worker nodes is operable to, during the coping, issue requests to the different quorums to obtain a majority approval from coordinator of the different quorums to perform a specified action associated with the particular partition. 13 . The non-transitory computer-readable medium of claim 11 , wherein the operations further comprise: performing a split operation on the replica to logically split the replica into two or more replicas corresponding to two or more subpartitions that represent a splitting of the particular partition; and performing a merge operation on the replica to logically merge the replica and another replica into a single replica corresponding to a single partition representing a merging of two partitions, wherein the split and merge operations are performed independent of other ones of the multiple coordinator nodes. 14 . The non-transitory computer-readable medium of claim 11 , wherein the operations further comprise: removing committed records from the replica based on committed records associated with the replica being persisted in a persistent store. 15 . The non-transitory computer-readable medium of claim 11 , wherein the operations further comprise: receiving a different rep

Assignees

Inventors

Classifications

  • Locking methods, e.g. distributed locking or locking implementation details · CPC title

  • Database cache management · CPC title

  • Ensuring data consistency and integrity · CPC title

  • Trees, e.g. B+trees · CPC title

  • G06F16/278Primary

    Data partitioning, e.g. horizontal or vertical partitioning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025036654A1 cover?
Techniques are disclosed relating to a database system. The database system includes multiple coordinator nodes storing replicas of a partition. Each partition describes the state of locks and transactions for keys covered by that partition of keys. Each partition is, in turn, replicated. The multiple coordinator nodes receive, from multiple worker nodes, requests to grant a lock for a key to p…
Who is the assignee on this patent?
Salesforce Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/278. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 30 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).