Managing storage of data for range-based searching
US-9811570-B2 · Nov 7, 2017 · US
US9613104B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9613104-B2 |
| Application number | US-201213399467-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 17, 2012 |
| Priority date | Feb 17, 2012 |
| Publication date | Apr 4, 2017 |
| Grant date | Apr 4, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method and system for building a point-in-time snapshot of an eventually-consistent data store. The data store includes key-value pairs stored on a plurality of storage nodes. In one embodiment, the data store is implemented as an Apache® Cassandra database running in the “cloud.” The data store includes a journaling mechanism that stores journals (i.e., inconsistent snapshots) of the data store on each node at various intervals. In Cassandra, these snapshots are sorted string tables that may be copied to a back-up storage location. A cluster of processing nodes may retrieve and resolve the inconsistent snapshots to generate a point-in-time snapshot of the data store corresponding to a lagging consistency point. In addition, the point-in-time snapshot may be updated as any new inconsistent snapshots are generated by the data store such that the lagging consistency point associated with the updated point-in-time snapshot is more recent.
Opening claim text (preview).
We claim: 1. A computer-implemented method for building a point-in-time snapshot of an eventually-consistent data store distributed among a plurality of nodes connected by a network, the method comprising: receiving a plurality of inconsistent snapshots, wherein each inconsistent snapshot includes one or more rows of key-value pairs associated with the data store and reflects contents of at least a portion of the data store stored on a particular node of the plurality of nodes; and generating the point-in-time snapshot by resolving the one or more rows of the key-value pairs to remove any inconsistent values, wherein the point-in-time snapshot includes a subset of the key-value pairs included in the plurality of inconsistent snapshots, wherein generating the point-in-time snapshot comprises: dividing the one or more rows of the key-value pairs from the plurality of inconsistent snapshots into one or more processing tasks, wherein each processing task includes a different portion of the key-value pairs; distributing each processing task to one of a plurality of processing nodes configured to perform a reduce operation; receiving a number of results from the plurality of processing nodes corresponding to a number of distributed processing tasks; and combining the number of results to generate the point-in-time snapshot. 2. The computer-implemented method of claim 1 , wherein the one or more rows of the key-value pairs include a key value and one or more columns associated with the key value, each column being associated with a column identifier, a corresponding column value, and a timestamp value that represents a particular time at which the column was added to the data store. 3. The computer-implemented method of claim 2 , wherein the reduce operation comprises performing a compaction operation configured to remove duplicate rows included in the one or more rows of the key-value pairs from the one or more processing tasks and generate a single row associated with each unique key value included in the one or more processing tasks, wherein the single row includes one or more columns associated with the unique key value, each column in the single row being associated with a unique column identifier for that single row selected from the column in a processing task included in the one or more processing tasks that is associated with the unique key value and the unique column identifier as well as the most recent timestamp value. 4. The computer-implemented method of claim 1 , wherein the plurality of processing nodes is implemented via an application framework configured to provide at least one of a distributed file system and a framework for processing large data sets on the plurality of processing nodes. 5. The computer-implemented method of claim 1 , wherein the data store is implemented via a database configured to automatically generate the inconsistent snapshots at periodic intervals, and wherein each inconsistent snapshot comprises a sorted string table (SSTable). 6. The computer-implemented method of claim 1 , further comprising converting the point-in-time snapshot to a JavaScript Object Notation (JSON) format. 7. The computer-implemented method of claim 1 , further comprising: receiving one or more additional inconsistent snapshots generated by the data store subsequent to generating the point-in-time snapshot; and generating an updated point-in-time snapshot by resolving the one or more rows of the key-value pairs in the one or more additional inconsistent snapshots as well as the point-in-time snapshot to remove any inconsistent values. 8. A system for building a point-in-time snapshot of an eventually-consistent data store, comprising: a plurality of slave processors connected by a network and storing the data store; and a master processor connected to the data store via the network, and, when executing a first software application stored in a memory, the master processor is configured to: receive a plurality of inconsistent snapshots, wherein each inconsistent snapshot includes one or more rows of key-value pairs associated with the data store and reflects contents of at least a portion of the data store stored on a first slave processor included in the plurality of slave processors, and generate the point-in-time snapshot by resolving the one or more rows of the key-value pairs to remove any inconsistent values, wherein the point-in-time snapshot includes a subset of the key-value pairs included in the plurality of inconsistent snapshots, wherein generating the point-in-time snapshot comprises: dividing the one or more rows of the key-value pairs from the plurality of inconsistent snapshots into one or more processing tasks, wherein each processing task includes a different portion of the key-value pairs; distributing each processing task to a slave processor included in the plurality of slave processors configured to perform a reduce operation; receiving a number of results from the plurality of slave processors corresponding to a number of distributed processing tasks; and combining the number of results to generate the point-in-time snapshot. 9. The system of claim 8 , wherein the one or more rows of the key-value pairs include a key value and one or more columns associated with the key value, each column being associated with a column identifier, a corresponding column value, and a timestamp value that represents a particular time at which the column was added to the data store. 10. The system of claim 9 , wherein the reduce operation comprises performing a compaction operation configured to remove duplicate rows included in the one or more rows of the key-value pairs from the one or more processing tasks and generate a single row associated with each unique key value included in the one or more processing tasks, wherein the single row includes one or more columns associated with the unique key value, each column in the single row being associated with a unique column identifier for that single row selected from the column in a processing task included in the one or more processing tasks that is associated with the unique key value and the unique column identifier as well as the most recent timestamp value. 11. The system of claim 8 , wherein the plurality of slave processors is implemented via an application framework configured to provide at least one of a distributed file system and a framework for processing large data sets on the plurality of slave processors. 12. The system of claim 8 , wherein the data store is implemented via a database configured to automatically generate the inconsistent snapshots at periodic intervals, and wherein each inconsistent snapshot comprises a sorted string table (SSTable). 13. The system of claim 8 , wherein, when executing the first software application, the master processor is further configured to convert the point-in-time snapshot to a JavaScript Object Notation (JSON) format. 14. The system of claim 8 , wherein, when executing the first software application, the master processor is further configured to: receive one or more additional inconsistent snapshots generated by the data store subsequent to generating the point-in-time snapshot; generate an updated sorted table that includes each of the one or more rows of the key-value pairs from the one or more additional inconsistent snapshots and the point-in-time snapshot; and generate an updated point-in-time snapshot by resolving the one or more rows of the key-value pairs in the updated sorted table to remove any inconsistent values. 15. A non-transitory computer-readable storage medium including in
Timestamp · CPC title
Physics · mapped topic
Database-specific techniques · CPC title
the resynchronized component or unit being a persistent storage device (re-synchronization of failed mirror storage G06F11/2082; rebuild or reconstruction of parity RAID storage G06F11/1008) · CPC title
Redundant storage or storage space (G06F11/2056 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.