Archival data storage system

US9767098B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9767098-B2
Application numberUS-201213570088-A
CountryUS
Kind codeB2
Filing dateAug 8, 2012
Priority dateAug 8, 2012
Publication dateSep 19, 2017
Grant dateSep 19, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A cost-effective, durable and scalable archival data storage system is provided herein that allow customers to store, retrieve and delete archival data objects, among other operations. For data storage, in an embodiment, the system stores data in a transient data store and provides a data object identifier may be used by subsequent requests. For data retrieval, in an embodiment, the system creates a job corresponding to the data retrieval and provides a job identifier associated with the created job. Once the job is executed, data retrieved is provided in a transient data store to enable customer download. In various embodiments, jobs associated with storage, retrieval and deletion are scheduled and executed using various optimization techniques such as load balancing, batch processed and partitioning. Data is redundantly encoded and stored in self-describing storage entities increasing reliability while reducing storage costs. Data integrity is ensured by integrity checks along data paths.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: under the control of one or more computer systems of an archival data storage system that are configured with executable instructions, receiving, over a network from a requestor system, a storage request to store a data object into the archival data storage system; causing storage of the data object in the archival data storage system by at least: encoding the data object with one or more encoding schemes to obtain a plurality of encoded data components, the one or more encoding schemes including at least redundancy coding; and causing storage of the plurality of encoded data components in at least one archival data storage device associated with the archival data storage system; providing a data object identifier associated with data object, the data object identifier including storage location information that at least describes the at least one archival data storage device storing the plurality of encoded data components; receiving, in connection with a retrieval request to retrieve the data object, the data object identifier; creating a retrieval job corresponding to the retrieval request; adding the retrieval job to a collection of pending jobs, at least one pending job of the collection of pending jobs being associated with a different data object from the data object; processing, in one or more batches, the collection of pending jobs; and providing the retrieved data object. 2. The computer-implemented method of claim 1 , further comprising creating a storage job corresponding to the storage request and adding the storage job to the collection of pending jobs. 3. The computer-implemented method of claim 2 , wherein providing the retrieved the data object comprises retrieving the encoded data components from the at least one archival data storage device. 4. The computer-implemented method of claim 1 , further comprising providing a retrieval job identifier associated with the retrieval job and wherein providing the retrieved data object includes transmitting the retrieved data object in one or more parts to a requestor system that specified the retrieval job identifier in a request for the data object. 5. The computer-implemented method of claim 1 , further comprising providing a notification of completion of the retrieval job after the retrieval job is successfully completed. 6. The computer-implemented method of claim 1 , further comprising validating integrity of the data object based at least in part on a digest of at least a portion of the data object. 7. A computer-implemented method comprising: under the control of one or more computer systems configured with executable instructions, receiving a data retrieval request to retrieve a data object, the data retrieval request specifying a data object identifier, the data object at least partially represented by a plurality of encoded data components generated from the data object using one or more encoding schemes, the one or more encoding schemes including at least redundancy coding, the data object identifier including storage location information that at least describes at least one location associated with the plurality of encoded data components; creating a data retrieval job corresponding to the data retrieval request; adding the data retrieval job to a batch including least one other data retrieval job corresponding to a different data object than the data object; providing a job identifier associated with the data retrieval job that is usable for obtaining information about the data retrieval job; and after providing the job identifier, processing the batch so as to execute the data retrieval job using at least in part the data object identifier to provide access to the data object. 8. The computer-implemented method of claim 7 , wherein the data object identifier is provided in response to a previous storage request to store the data object. 9. The computer-implemented method of claim 7 , wherein processing the data retrieval job comprises: selecting the data retrieval job for execution; determining, based at least in part on the data object identifier, one or more storage entities on which the one or more encoded data components are stored; causing retrieval of at least some of the one or more encoded data components from the determined one or more storage entities; and decoding the retrieved encoded data components to obtain the retrieved data object. 10. The computer-implemented method of claim 9 , wherein selecting the data retrieval job is based at least in part on a batch processing schedule. 11. The computer-implemented method of claim 10 , wherein the batch processing schedule is used to gain efficiency. 12. The computer-implemented method of claim 7 , further comprising providing a status of the data retrieval job in response to a job status request that specifies the job identifier. 13. A system for providing archival data storage services, comprising: one or more archival data storage devices; a transient data store; one or more processors; and memory, including executable instructions that, when executed by the one or more processors, cause the one or more processors to collectively at least: receive a data storage request to store a data object; cause storage of the data object in the transient store by at least: obtaining the data object from the transient store; encoding the data object with one or more encoding schemes to obtain a plurality of encoded data components, the one or more encoding schemes including at least redundancy coding; and causing storage of the plurality of encoded data components in at least some of the one or more archival data storage devices; adding the data storage request to a batch including at least one other data storage request corresponding to a different data object than the data object; provide a data object identifier associated with the data, the data object identifier encoding at least storage location information sufficient to locate the plurality of encoded data components associated with the data object; and after providing the data object identifier, cause processing of the batch so as to cause storage of the plurality of encoded data components in accordance with the storage location information. 14. The system of claim 13 , wherein the executable instructions, when executed by the one or more processors, further cause the one or more processors to collectively create a data storage job corresponding to the data storage request and wherein causing storage of the data object in location specified by the storage location information comprises processing the data storage job based at least in part on the storage location information. 15. The system of claim 14 , wherein the processing the data storage job comprises scheduling the job for execution based at least in part on a batch processing schedule. 16. The system of claim 13 , wherein the data object identifier encodes at least data validation information usable to validate integrity of the data object. 17. The system of claim 16 , wherein the executable instructions, when executed by the one or more processors, further cause the one or more processors to collectively validate integrity of the data object based at least in part on the data validation information. 18. One or more non-transitory computer-readable storage media having collectively stored thereon executable instructions that, when executed by one or more processors of an archival data storage system, cause the syst

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • Physics · mapped topic

  • Concurrency control (transaction processing G06F9/466) · CPC title

  • G06F16/113Primary

    Details of archiving (lifecycle management in storage systems G06F3/0649; point-in-time backing up or restoration of persistent data G06F11/1446) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9767098B2 cover?
A cost-effective, durable and scalable archival data storage system is provided herein that allow customers to store, retrieve and delete archival data objects, among other operations. For data storage, in an embodiment, the system stores data in a transient data store and provides a data object identifier may be used by subsequent requests. For data retrieval, in an embodiment, the system crea…
Who is the assignee on this patent?
Patiejunas Kestutis, Hamilton James R, Lazier Colin L, and 4 more
What technology area does this patent fall under?
Primary CPC classification G06F17/30008. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 19 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).