Write optimized, distributed, scalable indexing store
US-2021089407-A1 · Mar 25, 2021 · US
US11880290B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11880290-B2 |
| Application number | US-202318165257-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 6, 2023 |
| Priority date | Oct 30, 2020 |
| Publication date | Jan 23, 2024 |
| Grant date | Jan 23, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for processing data exactly once using transactional stream writes includes receiving, from a client, a batch of data blocks for storage on memory hardware in communication with the data processing hardware. The batch of data blocks is associated with a corresponding sequence number and represents a number of rows of a table stored on the memory hardware. The method also includes partitioning the batch of data blocks into a plurality of sub-batches of data blocks. For each sub-batch of data blocks, the method further includes assigning the sub-batch of data blocks to a buffered stream; writing, using the assigned buffered stream, the sub-batch of data blocks to the memory hardware; updating a storage log with an intent to commit the sub-batch of data blocks using the assigned buffered stream; and committing the sub-batch of data blocks to the memory hardware.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method executed by data processing hardware that causes the data processing hardware to perform operations comprising: receiving, from a client, a batch of data blocks for storage on memory hardware in communication with the data processing hardware; partitioning the batch of data blocks into a plurality of sub-batches of data blocks; assigning each sub-batch of data blocks of the plurality of sub-batches of data blocks to a respective buffered stream, each respective buffered stream configured to write the assigned sub-batches to the memory hardware; determining that a particular sub-batch of data blocks of the plurality of sub-batches of data blocks failed to be written to the memory hardware; in response to determining that the particular sub-batch of data blocks of the plurality of sub-batches of data blocks failed to be written to the memory hardware, assigning the particular sub-batch of data blocks to a new respective buffered stream; and receiving an intent to commit the particular sub-batch of data blocks using the new respective buffered stream, the intent to commit indicating that the particular sub-batch of data blocks is successfully written to the memory hardware. 2. The method of claim 1 , wherein the batch of data blocks are associated with a corresponding sequence number and representing a number of rows of a table stored on the memory hardware. 3. The method of claim 1 , wherein the operations further comprise, in response to determining that the particular sub-batch of data blocks of the plurality of sub-batches of data blocks failed to be written to the memory hardware, retrying, using the respective assigned buffered stream, to write the particular sub-batch of data blocks to the memory hardware. 4. The method of claim 3 , wherein the operations further comprise determining that retrying, using the respective assigned buffered stream, to write the particular sub-batch of data blocks to the memory hardware has failed to complete before committing the particular sub-batch of data blocks to the memory hardware. 5. The method of claim 4 , wherein the operations further comprise removing, from the memory hardware, the particular sub-batch of data blocks from the respective assigned buffered stream. 6. The method of claim 5 , wherein removing, from the memory hardware, the particular sub-batch of data blocks from the respective assigned buffered stream comprises performing garbage-collection on the particular sub-batch of data blocks from the respective assigned buffered stream. 7. The method of claim 1 , wherein the operations further comprise: in response to receiving the intent to commit the particular sub-batch of data blocks to the memory hardware, determining a current timestamp; and associating the particular sub-batch of data blocks with the current timestamp. 8. The method of claim 7 , wherein the operations further comprise converting the particular sub-batch of data blocks into a read-optimized format based on the current timestamp. 9. The method of claim 7 , wherein the operations further comprise: receiving a query request at a snapshot timestamp, the query request requesting return of data blocks stored on the memory hardware that match query parameters; and returning any data blocks of the particular sub-batch of data blocks that match the query parameters when the snapshot timestamp is later than the current timestamp associated with the particular sub-batch of data blocks. 10. The method of claim 1 , wherein the intent to commit indicates committing the particular sub-batch of data blocks to the memory hardware using a flush transform. 11. A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising: receiving, from a client, a batch of data blocks for storage on memory hardware in communication with the data processing hardware; partitioning the batch of data blocks into a plurality of sub-batches of data blocks; assigning each sub-batch of data blocks of the plurality of sub-batches of data blocks to a respective buffered stream, each respective buffered stream configured to write the assigned sub-batches to the memory hardware; determining that a particular sub-batch of data blocks of the plurality of sub-batches of data blocks failed to be written to the memory hardware; in response to determining that the particular sub-batch of data blocks of the plurality of sub-batches of data blocks failed to be written to the memory hardware, assigning the particular sub-batch of data blocks to a new respective buffered stream; and receiving an intent to commit the particular sub-batch of data blocks using the new respective buffered stream, the intent to commit indicating that the particular sub-batch of data blocks is successfully written to the memory hardware. 12. The system of claim 11 , wherein the batch of data blocks are associated with a corresponding sequence number and representing a number of rows of a table stored on the memory hardware. 13. The system of claim 11 , wherein the operations further comprise, in response to determining that the particular sub-batch of data blocks of the plurality of sub-batches of data blocks failed to be written to the memory hardware, retrying, using the respective assigned buffered stream, to write the particular sub-batch of data blocks to the memory hardware. 14. The system of claim 13 , wherein the operations further comprise determining that retrying, using the respective assigned buffered stream, to write the particular sub-batch of data blocks to the memory hardware has failed to complete before committing the particular sub-batch of data blocks to the memory hardware. 15. The system of claim 14 , wherein the operations further comprise removing, from the memory hardware, the particular sub-batch of data blocks from the respective assigned buffered stream. 16. The system of claim 15 , wherein removing, from the memory hardware, the particular sub-batch of data blocks from the respective assigned buffered stream comprises performing garbage-collection on the particular sub-batch of data blocks from the respective assigned buffered stream. 17. The system of claim 11 , wherein the operations further comprise: in response to receiving the intent to commit the particular sub-batch of data blocks to the memory hardware, determining a current timestamp; and associating the particular sub-batch of data blocks with the current timestamp. 18. The system of claim 17 , wherein the operations further comprise converting the particular sub-batch of data blocks into a read-optimized format based on the current timestamp. 19. The system of claim 17 , wherein the operations further comprise: receiving a query request at a snapshot timestamp, the query request requesting return of data blocks stored on the memory hardware that match query parameters; and returning any data blocks of the particular sub-batch of data blocks that match the query parameters when the snapshot timestamp is later than the current timestamp associated with the particular sub-batch of data blocks. 20. The system of claim 11 , wherein the intent to commit indicates committing the particular sub-batch of data blocks to the memory hardware using a flush transform.
Saving, restoring, recovering or retrying · CPC title
where the computing system component is a storage system, e.g. DASD based or network based (digital input from or digital output to record carriers G06F3/06; digital recording or reproducing G11B20/18; for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS], H04L67/1097) · CPC title
Prefetch instructions; cache control instructions · CPC title
Transactional memory (G06F9/528 takes precedence) · CPC title
by exceeding a time limit, i.e. time-out, e.g. watchdogs · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.