Distributed database systems and structures

US11520780B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11520780-B2
Application numberUS-202117316451-A
CountryUS
Kind codeB2
Filing dateMay 10, 2021
Priority dateMay 18, 2016
Publication dateDec 6, 2022
Grant dateDec 6, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and techniques are described for efficient, general-purpose, and potentially decentralized databases, distributed storage systems, version control systems, and/or other types of data repositories. Data is represented in a database system in such a way that any value is represented by a unique identifier which is derived from the value itself. Any database peer in the system will derive an identical identifier from the same logical value. The identifier for a value may be derived using a variety of mechanisms, including, without limitation, a hash function known to all peers in the system. The values may be organized hierarchically as a tree of nodes. Any two peers storing the same logical value will deterministically represent that value with a graph, such as the described “Prolly” tree, having the same topology and hash value, irrespective of possibly differing sequences of mutations which caused each to arrive at the same final value.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: storing, at a server, a first version of a structured data set using a first tree of first nodes, the first nodes comprising a first root node, first leaf nodes, and one or more levels of intermediary nodes that link the first root node to the first leaf nodes, each of the first leaf nodes storing one or more values of the structured data set; receiving a commit request from a client to store a new version of the structured data set, the commit request indicating a subgraph of new nodes, including at least one new leaf node that stores one or more new values of the structured data set, a new root node, and one or more new intermediary nodes that link the new leaf node to the new root node; storing the new version of the structured data set using a second tree comprising the new nodes and a set of the previously-stored first nodes that are referenced by the new nodes; responding to requests from clients to return the structured data set by, when a current version of the structured data set is requested, reading the second tree starting from the new root node, and, when the first version is requested, reading the first tree starting from the first root node. 2. The method of claim 1 , wherein the commit request indicates a location at which the server stored the first root node, wherein storing the new version comprises, prior to writing the new nodes, reading the first root node and ensuring that any references in the new root node to nodes that are not new nodes are also found in the first root node, the references including both location and type information. 3. The method of claim 1 , further comprising, responsive to the commit request, storing a new commit object that points to the new root node in a sequence of commit objects for the structured data set, each of the commit objects pointing to a root node of a tree that represents a version of the structured data set as of a particular time, each of the versions including at least some of the same nodes, the commit objects also including at least a first commit object that points to the first root. 4. The method of claim 1 , further comprising writing each node of the first nodes and the new nodes at an address calculated from a set of values that the node stores, wherein when the node is a leaf node the set of values that the node stores includes one or more values from the structured data set, wherein when the node is a root node or an intermediate node the set of values that the node stores includes one or more references to addresses of other nodes immediately below the node in one or both of the first tree or the second tree. 5. The method of claim 1 , wherein the commit request specifies values for the new nodes, the method further comprising: calculating, at the server, locations at which to store the new nodes based on the values specified for the new nodes in the commit request; prior to the server writing the new nodes, verifying that those of the specified values that are references to descendent nodes match the locations that the server calculated for those descendent nodes. 6. The method of claim 1 , further comprising: using a rolling hash function to divide the first version of the structured set of data into a first set of chunks, the one or more values for a given leaf node of the first leaf nodes being those of a corresponding one of the chunks that the given leaf node represents; using the rolling hash function to divide a portion of the second version that includes the one or more new values into one or new chunks, a particular chunk of the one or more chunks comprising the one or more new values. 7. The method of claim 1 , wherein the structured data set is an ordered list of objects, each leaf node corresponding to a different object in the list, wherein the new leaf node corresponds to a new object in the second version of the list, and a new intermediary node of the one or more new intermediary nodes links to both the new leaf node and at least one previously-existing leaf node of the first tree that corresponds to a previously-existing object before or after which the new object was inserted. 8. The method of claim 1 , wherein the subgraph forms a spine of the second tree. 9. A system comprising: storage media configured to store a first version of a structured data set using a first tree of first nodes, the first nodes comprising a first root node, first leaf nodes, and one or more levels of intermediary nodes that link the first root node to the first leaf nodes, each of the first leaf nodes storing one or more values of the structured data set; logic configured to receive, from a client, a commit request from a client to store a new version of the structured data set, the commit request indicating a subgraph of new nodes, including at least one new leaf node that stores one or more new values of the structure data set, a new root node, and one or more new intermediary nodes that link the new leaf node to the new root node; logic configured to store the new version of the structured data set in the storage media by writing, to the storage media, a second tree comprising the new nodes and a set of the previously-stored first nodes that are referenced by the new nodes; logic configured to respond to requests from clients to return the structured data set by, when a current version of the structured data set is requested, reading the second tree starting from the new root node, and, when the first version is requested, reading the first tree starting from the first root node. 10. The system of claim 9 , wherein the commit request indicates a location at which the server stored the first root node, wherein writing the new version comprises, prior to writing the new nodes, reading the first root node and ensuring that any references in the new root node to nodes that are not new nodes are also found in the first root node, the references including both location and type information. 11. The system of claim 9 , further comprising logic configured to, responsive to the commit request, store a new commit object that points to the new root node in a sequence of commit objects for the structured data set, each of the commit objects pointing to a root node of a tree that represents a version of the structured data set as of a particular time, each of the versions including at least some of the same nodes, the commit objects also including at least a first commit object that points to the first root. 12. The system of claim 9 , further comprising logic configured to write each node of the first nodes and the new nodes at an address calculated from a set of values that the node stores, wherein when the node is a leaf node the set of values that the node stores includes one or more values from the structured data set, wherein when the node is a root node or an intermediate node the set of values that the node stores includes one or more references to addresses of other nodes immediately below the node in one or both of the first tree or the second tree. 13. The system of claim 9 , wherein the commit request specifies values for the new nodes, the system further comprising: logic configured to calculate locations at which to store the new nodes in the storage media based on the values specified for the new nodes in the commit request; logic configured to verify, prior to the server writing the new nodes, that those of the specified values that are references to descendent nodes match the locations that were calculated for those descendent nodes. 14. The system of claim 9 , further comprising: chunking logic configured to usin

Assignees

Inventors

Classifications

  • Update request formulation · CPC title

  • Clustering or classification · CPC title

  • Trees, e.g. B+trees · CPC title

  • for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title

  • Trees · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11520780B2 cover?
Systems and techniques are described for efficient, general-purpose, and potentially decentralized databases, distributed storage systems, version control systems, and/or other types of data repositories. Data is represented in a database system in such a way that any value is represented by a unique identifier which is derived from the value itself. Any database peer in the system will derive …
Who is the assignee on this patent?
Salesforce Com Inc, Salesforce Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2246. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 06 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).