Projections for big database systems

US11620280B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11620280-B2
Application numberUS-202117444715-A
CountryUS
Kind codeB2
Filing dateAug 9, 2021
Priority dateAug 19, 2020
Publication dateApr 4, 2023
Grant dateApr 4, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A database system comprised of a decoupled compute layer and storage layer is implemented to store, build, and maintain a canonical dataset, a temporary buffer, and projection datasets. The canonical dataset is a set of batch updated data. The data is appended in chunks to the canonical dataset such that the canonical dataset becomes a historical dataset over time. The buffer is a write ahead log that contains the most recent chunks of data and provides atomicity and durability for the database system. The projection datasets are indexes of the canonical dataset and/or the buffer that may have single or multiple column sort-orders and/or particular data formats. The writes to the canonical dataset, projection datasets, and buffer may be asynchronous and therefore the database system is advantageously less resource constrained.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: one or more non-transitory computer readable storage mediums configured to store: program instructions; a canonical dataset; at least a first projection dataset; and a buffer; and one or more processors configured to execute the program instructions to cause the system to: receive a first data chunk comprising an edit to the canonical dataset; temporarily store the first data chunk in the buffer; asynchronously: update the canonical dataset based on the first data chunk; and update the first projection dataset based on at least a part of the first data chunk; and flush the first data chunk from the buffer. 2. The system of claim 1 , wherein the first projection dataset comprises a first sorted subset of data of the canonical dataset. 3. The system of claim 2 , wherein the one or more non-transitory computer readable storage mediums are further configured to store: a second projection dataset comprising a second sorted subset of data of the canonical dataset, wherein the second sorted subset of data is different from the first sorted subset of data. 4. The system of claim 3 , wherein the first and second sorted subsets of data are selected and sorted based on different projection rules. 5. The system of claim 3 , wherein the one or more processors are configured to execute the program instructions to further cause the system to: asynchronously update the second projection dataset based on the first data chunk. 6. The system of claim 5 , wherein asynchronously updating the first and second projection datasets includes selecting, based on projection rules applicable to the respective first and second projection datasets, portions of the first data chunk needed to update the respective first and second projection datasets. 7. The system of claim 6 , wherein the one or more processors are configured to execute the program instructions to further cause the system to: periodically compacting the first and second projection datasets. 8. The system of claim 3 , wherein the one or more processors are configured to execute the program instructions to further cause the system to: in response to receipt of a query of the canonical dataset: select, based on the query, at least one of the first or second projection datasets; execute the query on the selected at least one of the first or second projection datasets; and return a result of the query. 9. The system of claim 8 , wherein the one or more processors are configured to execute the program instructions to further cause the system to: receive a second data chunk comprising another edit to the canonical dataset; temporarily store the second data chunk in the buffer; and in response to receipt of a second query of the canonical dataset: select, based on the second query, at least another one of the first or second projection datasets; join the buffer to the selected at least another one of the first or second projection datasets to form a joined projection dataset; execute the second query on the joined projected dataset; and return a result of the second query. 10. The system of claim 8 , wherein the selected at least one of the first or second projection datasets includes both the first projection dataset and the second projection dataset. 11. The system of claim 10 , wherein the one or more processors are configured to execute the program instructions to further cause the system to: further in response to receipt of the query of the canonical dataset: join the first and second projection datasets to form a joined projection dataset, wherein the query is executed on the joined projected dataset. 12. A computer-implemented method comprising: receiving a first data chunk comprising an edit to a canonical dataset; temporarily storing the first data chunk in a buffer; asynchronously: updating the canonical dataset based on the first data chunk; and updating at least one projection dataset based on at least a part of the first data chunk; and flushing the first data chunk from the buffer. 13. The method of claim 12 , wherein the at least one projection dataset comprises a first projection dataset, and wherein the first projection dataset comprises a first sorted subset of data of the canonical dataset. 14. The method of claim 13 , wherein the at least one projection dataset further comprises a second projection dataset, wherein the second projection dataset comprises a second sorted subset of data of the canonical dataset, and wherein the second sorted subset of data is different from the first sorted subset of data. 15. The method of claim 14 , wherein the first and second sorted subsets of data are selected and sorted based on different projection rules. 16. The method of claim 14 , wherein the second projection dataset is asynchronously updated based on the first data chunk. 17. The method of claim 16 , wherein asynchronously updating the first and second projection datasets includes selecting, based on projection rules applicable to the respective first and second projection datasets, portions of the first data chunk needed to update the respective first and second projection datasets. 18. The method of claim 14 , further comprising periodically compacting the first and second projection datasets. 19. Non-transitory computer-readable media including computer-executable instructions that, when executed by a computing system, cause the computing system to perform operations comprising: receiving a first data chunk comprising an edit to a canonical dataset; temporarily storing the first data chunk in a buffer; asynchronously: updating the canonical dataset based on the first data chunk; and updating at least one projection dataset based on at least a part of the first data chunk; and flushing the first data chunk from the buffer. 20. The non-transitory computer-readable media of claim 19 , wherein the at least one projection dataset comprises a first projection dataset, and wherein the first projection dataset comprises a first sorted subset of data of the canonical dataset.

Assignees

Inventors

Classifications

  • Tablespace storage structures; Management thereof · CPC title

  • Presentation of query results · CPC title

  • Query execution · CPC title

  • Unary operations; Data partitioning operations · CPC title

  • Change logging, detection, and notification (replication G06F16/27) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11620280B2 cover?
A database system comprised of a decoupled compute layer and storage layer is implemented to store, build, and maintain a canonical dataset, a temporary buffer, and projection datasets. The canonical dataset is a set of batch updated data. The data is appended in chunks to the canonical dataset such that the canonical dataset becomes a historical dataset over time. The buffer is a write ahead l…
Who is the assignee on this patent?
Palantir Technologies Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2282. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 04 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).