System and method for building a point-in-time snapshot of an eventually-consistent data store

US9613104B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9613104-B2
Application numberUS-201213399467-A
CountryUS
Kind codeB2
Filing dateFeb 17, 2012
Priority dateFeb 17, 2012
Publication dateApr 4, 2017
Grant dateApr 4, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and system for building a point-in-time snapshot of an eventually-consistent data store. The data store includes key-value pairs stored on a plurality of storage nodes. In one embodiment, the data store is implemented as an Apache® Cassandra database running in the “cloud.” The data store includes a journaling mechanism that stores journals (i.e., inconsistent snapshots) of the data store on each node at various intervals. In Cassandra, these snapshots are sorted string tables that may be copied to a back-up storage location. A cluster of processing nodes may retrieve and resolve the inconsistent snapshots to generate a point-in-time snapshot of the data store corresponding to a lagging consistency point. In addition, the point-in-time snapshot may be updated as any new inconsistent snapshots are generated by the data store such that the lagging consistency point associated with the updated point-in-time snapshot is more recent.

First claim

Opening claim text (preview).

We claim: 1. A computer-implemented method for building a point-in-time snapshot of an eventually-consistent data store distributed among a plurality of nodes connected by a network, the method comprising: receiving a plurality of inconsistent snapshots, wherein each inconsistent snapshot includes one or more rows of key-value pairs associated with the data store and reflects contents of at least a portion of the data store stored on a particular node of the plurality of nodes; and generating the point-in-time snapshot by resolving the one or more rows of the key-value pairs to remove any inconsistent values, wherein the point-in-time snapshot includes a subset of the key-value pairs included in the plurality of inconsistent snapshots, wherein generating the point-in-time snapshot comprises: dividing the one or more rows of the key-value pairs from the plurality of inconsistent snapshots into one or more processing tasks, wherein each processing task includes a different portion of the key-value pairs; distributing each processing task to one of a plurality of processing nodes configured to perform a reduce operation; receiving a number of results from the plurality of processing nodes corresponding to a number of distributed processing tasks; and combining the number of results to generate the point-in-time snapshot. 2. The computer-implemented method of claim 1 , wherein the one or more rows of the key-value pairs include a key value and one or more columns associated with the key value, each column being associated with a column identifier, a corresponding column value, and a timestamp value that represents a particular time at which the column was added to the data store. 3. The computer-implemented method of claim 2 , wherein the reduce operation comprises performing a compaction operation configured to remove duplicate rows included in the one or more rows of the key-value pairs from the one or more processing tasks and generate a single row associated with each unique key value included in the one or more processing tasks, wherein the single row includes one or more columns associated with the unique key value, each column in the single row being associated with a unique column identifier for that single row selected from the column in a processing task included in the one or more processing tasks that is associated with the unique key value and the unique column identifier as well as the most recent timestamp value. 4. The computer-implemented method of claim 1 , wherein the plurality of processing nodes is implemented via an application framework configured to provide at least one of a distributed file system and a framework for processing large data sets on the plurality of processing nodes. 5. The computer-implemented method of claim 1 , wherein the data store is implemented via a database configured to automatically generate the inconsistent snapshots at periodic intervals, and wherein each inconsistent snapshot comprises a sorted string table (SSTable). 6. The computer-implemented method of claim 1 , further comprising converting the point-in-time snapshot to a JavaScript Object Notation (JSON) format. 7. The computer-implemented method of claim 1 , further comprising: receiving one or more additional inconsistent snapshots generated by the data store subsequent to generating the point-in-time snapshot; and generating an updated point-in-time snapshot by resolving the one or more rows of the key-value pairs in the one or more additional inconsistent snapshots as well as the point-in-time snapshot to remove any inconsistent values. 8. A system for building a point-in-time snapshot of an eventually-consistent data store, comprising: a plurality of slave processors connected by a network and storing the data store; and a master processor connected to the data store via the network, and, when executing a first software application stored in a memory, the master processor is configured to: receive a plurality of inconsistent snapshots, wherein each inconsistent snapshot includes one or more rows of key-value pairs associated with the data store and reflects contents of at least a portion of the data store stored on a first slave processor included in the plurality of slave processors, and generate the point-in-time snapshot by resolving the one or more rows of the key-value pairs to remove any inconsistent values, wherein the point-in-time snapshot includes a subset of the key-value pairs included in the plurality of inconsistent snapshots, wherein generating the point-in-time snapshot comprises: dividing the one or more rows of the key-value pairs from the plurality of inconsistent snapshots into one or more processing tasks, wherein each processing task includes a different portion of the key-value pairs; distributing each processing task to a slave processor included in the plurality of slave processors configured to perform a reduce operation; receiving a number of results from the plurality of slave processors corresponding to a number of distributed processing tasks; and combining the number of results to generate the point-in-time snapshot. 9. The system of claim 8 , wherein the one or more rows of the key-value pairs include a key value and one or more columns associated with the key value, each column being associated with a column identifier, a corresponding column value, and a timestamp value that represents a particular time at which the column was added to the data store. 10. The system of claim 9 , wherein the reduce operation comprises performing a compaction operation configured to remove duplicate rows included in the one or more rows of the key-value pairs from the one or more processing tasks and generate a single row associated with each unique key value included in the one or more processing tasks, wherein the single row includes one or more columns associated with the unique key value, each column in the single row being associated with a unique column identifier for that single row selected from the column in a processing task included in the one or more processing tasks that is associated with the unique key value and the unique column identifier as well as the most recent timestamp value. 11. The system of claim 8 , wherein the plurality of slave processors is implemented via an application framework configured to provide at least one of a distributed file system and a framework for processing large data sets on the plurality of slave processors. 12. The system of claim 8 , wherein the data store is implemented via a database configured to automatically generate the inconsistent snapshots at periodic intervals, and wherein each inconsistent snapshot comprises a sorted string table (SSTable). 13. The system of claim 8 , wherein, when executing the first software application, the master processor is further configured to convert the point-in-time snapshot to a JavaScript Object Notation (JSON) format. 14. The system of claim 8 , wherein, when executing the first software application, the master processor is further configured to: receive one or more additional inconsistent snapshots generated by the data store subsequent to generating the point-in-time snapshot; generate an updated sorted table that includes each of the one or more rows of the key-value pairs from the one or more additional inconsistent snapshots and the point-in-time snapshot; and generate an updated point-in-time snapshot by resolving the one or more rows of the key-value pairs in the updated sorted table to remove any inconsistent values. 15. A non-transitory computer-readable storage medium including in

Assignees

Inventors

Classifications

  • Timestamp · CPC title

  • Physics · mapped topic

  • Database-specific techniques · CPC title

  • the resynchronized component or unit being a persistent storage device (re-synchronization of failed mirror storage G06F11/2082; rebuild or reconstruction of parity RAID storage G06F11/1008) · CPC title

  • Redundant storage or storage space (G06F11/2056 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9613104B2 cover?
A method and system for building a point-in-time snapshot of an eventually-consistent data store. The data store includes key-value pairs stored on a plurality of storage nodes. In one embodiment, the data store is implemented as an Apache® Cassandra database running in the “cloud.” The data store includes a journaling mechanism that stores journals (i.e., inconsistent snapshots) of the data st…
Who is the assignee on this patent?
Smith Charles, Magnusson Jeffrey, Anand Siddharth, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F17/30548. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 04 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).