Selectively replicating changes to hierarchial data structures
US-10671639-B1 · Jun 2, 2020 · US
US11709809B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-11709809-B1 |
| Application number | US-202117216359-A |
| Country | US |
| Kind code | B1 |
| Filing date | Mar 29, 2021 |
| Priority date | Mar 29, 2021 |
| Publication date | Jul 25, 2023 |
| Grant date | Jul 25, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for using tree data structures to maintain a transactionally consistent set with support for time-travel queries are described. When a transaction commits, a new version of the tree data structure is created using a copy-on-write based method such that the tree shares internal nodes with previous trees to save space. This approach may be used in the implementation of a transactional data catalog in which the files that make up a table are stored in a transactional set.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: receiving, at a transactional access manager of a data lake management service, a request to begin an atomic transaction involving a metadata table of a data catalog, wherein the metadata table references data objects stored at a separate object storage service that make up the table; receiving, at the transactional access manager, one or more commands to update metadata of the table as part of the transaction; based at least in part on the one or more commands to update metadata of the table as part of the transaction: generating, based on a first tree data structure storing metadata associated with a first version of the table, a second tree data structure associated with a second version of the table resulting from the one or more commands, wherein the first tree data structure includes one or more nodes that do not exist in the second tree data structure, wherein the second tree data structure includes one or more nodes that do not exist in the first tree data structure, and wherein the second tree data structure references one or more nodes of the first tree data structure; and updating a version history data structure associated with the table to include a second version node associated with the second version, wherein the second version node references the second tree data structure and further references a first version node associated with the first version; receiving, at the transactional access manager, a request to commit the transaction; committing the transaction, comprising at least further updating the version history data structure: receiving a query to be executed using the table, wherein the query includes or is associated with a time value indicating a point in time of the table that the query is to be executed against; identifying the second version node within the version history data structure based on use of the time value or a transaction identifier provided with the query; and executing the query using the second tree data structure referenced by the second version node. 2. The computer-implemented method of claim 1 , wherein the updating of the version history data structure associated with the table to include the second version node includes setting a status of the second version node to be uncommitted; and the committing of the transaction includes setting the status of the second version node to be committed. 3. The computer-implemented method of claim 1 , wherein the second tree data structure comprises a b-tree data structure that maps one or more partition keys to identifiers of data objects or data object storage locations of the object storage service. 4. A computer-implemented method comprising: receiving, at a transactional access manager, one or more commands to update a metadata table of a data catalog via an atomic transaction; based on the one or more commands: generating, based on a first tree data structure storing metadata associated with a first version of the table, a second tree data structure associated with a second version of the table resulting from the one or more commands being performed upon the first version, wherein the first tree data structure includes one or more nodes that do not exist in the second tree data structure and the second tree data structure includes one or more nodes that do not exist in the first tree data structure, and wherein the second tree data structure references one or more nodes of the first tree data structure; and updating a version history data structure associated with the table to include a second version node associated with the second version, wherein the second version node references the second tree data structure and further references a first version node associated with the first version; receiving a request to commit the transaction; and committing the transaction, comprising at least further updating the version history data structure. 5. The computer-implemented method of claim 4 , wherein: the version history data structure further includes a third version node associated with a third version of the table corresponding to an earlier point in time than either the first version or the second version, wherein the third version node references a third tree data structure; and the first tree data structure and the second tree data structure each have fewer or different nodes than the third tree data structure. 6. The computer-implemented method of claim 4 , further comprising: receiving a query to be executed using the table, wherein the query includes or is associated with a time value indicating a point in time of the table that the query is to be executed against; identifying the second version node within the version history data structure based on use of the time value and a commit time value of the second version node; and executing the query using the second tree data structure referenced by the second version node. 7. The computer-implemented method of claim 4 , wherein the transaction includes updates to data of the table and of a second table, the method further comprising: via an atomic procedure, along with the updating of the version history data structure associated with the table, also updating a second version history data structure associated with the second table to include a third version node, wherein the third version node references a third tree data structure and further references a fourth version node of the second version history data structure. 8. The computer-implemented method of claim 4 , wherein the generating of the second tree data structure occurs prior to the receiving of the request to commit the transaction. 9. The computer-implemented method of claim 8 , wherein the updating of the version history data structure associated with the table to include the second version node associated with the second version includes setting a status of the second version node to be uncommitted, and wherein committing the transaction includes setting the status of the second version node to be committed. 10. The computer-implemented method of claim 9 , further comprising: receiving a query to be executed using the table as part of the transaction; identifying the second version node based on use of a transaction identifier; and executing the query using the second tree data structure referenced by the second version node. 11. The computer-implemented method of claim 9 , further comprising: after a beginning of the transaction but prior to the committing of the transaction: receiving a query to be executed using the table, wherein the query is not to be part of the transaction; identifying the first version node from the version history data structure; and executing the query using the first tree data structure referenced by the first version node. 12. The computer-implemented method of claim 4 , wherein the generating of the second tree data structure occurs after the receiving of the request to commit the transaction. 13. The computer-implemented method of claim 4 , further comprising: receiving a second request to commit a second transaction including an update to metadata of the table; determining, as part of a commit process for the second transaction, that a third transaction involving the table was committed after a start of the second transaction, wherein the third transaction is associated with a third version node of the version history data structure that references a third tree data structure; generating a fourth tree data structure based on the third tree data structure, the generating including reapplying updates from a tra
Managing data history or versioning (querying versioned data G06F16/2474; querying temporal data G06F16/2477) · CPC title
Trees, e.g. B+trees · CPC title
using versioning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.