Load balancing backup jobs in a virtualized storage system having a plurality of physical nodes
US-9372854-B2 · Jun 21, 2016 · US
US9767098B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9767098-B2 |
| Application number | US-201213570088-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 8, 2012 |
| Priority date | Aug 8, 2012 |
| Publication date | Sep 19, 2017 |
| Grant date | Sep 19, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A cost-effective, durable and scalable archival data storage system is provided herein that allow customers to store, retrieve and delete archival data objects, among other operations. For data storage, in an embodiment, the system stores data in a transient data store and provides a data object identifier may be used by subsequent requests. For data retrieval, in an embodiment, the system creates a job corresponding to the data retrieval and provides a job identifier associated with the created job. Once the job is executed, data retrieved is provided in a transient data store to enable customer download. In various embodiments, jobs associated with storage, retrieval and deletion are scheduled and executed using various optimization techniques such as load balancing, batch processed and partitioning. Data is redundantly encoded and stored in self-describing storage entities increasing reliability while reducing storage costs. Data integrity is ensured by integrity checks along data paths.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: under the control of one or more computer systems of an archival data storage system that are configured with executable instructions, receiving, over a network from a requestor system, a storage request to store a data object into the archival data storage system; causing storage of the data object in the archival data storage system by at least: encoding the data object with one or more encoding schemes to obtain a plurality of encoded data components, the one or more encoding schemes including at least redundancy coding; and causing storage of the plurality of encoded data components in at least one archival data storage device associated with the archival data storage system; providing a data object identifier associated with data object, the data object identifier including storage location information that at least describes the at least one archival data storage device storing the plurality of encoded data components; receiving, in connection with a retrieval request to retrieve the data object, the data object identifier; creating a retrieval job corresponding to the retrieval request; adding the retrieval job to a collection of pending jobs, at least one pending job of the collection of pending jobs being associated with a different data object from the data object; processing, in one or more batches, the collection of pending jobs; and providing the retrieved data object. 2. The computer-implemented method of claim 1 , further comprising creating a storage job corresponding to the storage request and adding the storage job to the collection of pending jobs. 3. The computer-implemented method of claim 2 , wherein providing the retrieved the data object comprises retrieving the encoded data components from the at least one archival data storage device. 4. The computer-implemented method of claim 1 , further comprising providing a retrieval job identifier associated with the retrieval job and wherein providing the retrieved data object includes transmitting the retrieved data object in one or more parts to a requestor system that specified the retrieval job identifier in a request for the data object. 5. The computer-implemented method of claim 1 , further comprising providing a notification of completion of the retrieval job after the retrieval job is successfully completed. 6. The computer-implemented method of claim 1 , further comprising validating integrity of the data object based at least in part on a digest of at least a portion of the data object. 7. A computer-implemented method comprising: under the control of one or more computer systems configured with executable instructions, receiving a data retrieval request to retrieve a data object, the data retrieval request specifying a data object identifier, the data object at least partially represented by a plurality of encoded data components generated from the data object using one or more encoding schemes, the one or more encoding schemes including at least redundancy coding, the data object identifier including storage location information that at least describes at least one location associated with the plurality of encoded data components; creating a data retrieval job corresponding to the data retrieval request; adding the data retrieval job to a batch including least one other data retrieval job corresponding to a different data object than the data object; providing a job identifier associated with the data retrieval job that is usable for obtaining information about the data retrieval job; and after providing the job identifier, processing the batch so as to execute the data retrieval job using at least in part the data object identifier to provide access to the data object. 8. The computer-implemented method of claim 7 , wherein the data object identifier is provided in response to a previous storage request to store the data object. 9. The computer-implemented method of claim 7 , wherein processing the data retrieval job comprises: selecting the data retrieval job for execution; determining, based at least in part on the data object identifier, one or more storage entities on which the one or more encoded data components are stored; causing retrieval of at least some of the one or more encoded data components from the determined one or more storage entities; and decoding the retrieved encoded data components to obtain the retrieved data object. 10. The computer-implemented method of claim 9 , wherein selecting the data retrieval job is based at least in part on a batch processing schedule. 11. The computer-implemented method of claim 10 , wherein the batch processing schedule is used to gain efficiency. 12. The computer-implemented method of claim 7 , further comprising providing a status of the data retrieval job in response to a job status request that specifies the job identifier. 13. A system for providing archival data storage services, comprising: one or more archival data storage devices; a transient data store; one or more processors; and memory, including executable instructions that, when executed by the one or more processors, cause the one or more processors to collectively at least: receive a data storage request to store a data object; cause storage of the data object in the transient store by at least: obtaining the data object from the transient store; encoding the data object with one or more encoding schemes to obtain a plurality of encoded data components, the one or more encoding schemes including at least redundancy coding; and causing storage of the plurality of encoded data components in at least some of the one or more archival data storage devices; adding the data storage request to a batch including at least one other data storage request corresponding to a different data object than the data object; provide a data object identifier associated with the data, the data object identifier encoding at least storage location information sufficient to locate the plurality of encoded data components associated with the data object; and after providing the data object identifier, cause processing of the batch so as to cause storage of the plurality of encoded data components in accordance with the storage location information. 14. The system of claim 13 , wherein the executable instructions, when executed by the one or more processors, further cause the one or more processors to collectively create a data storage job corresponding to the data storage request and wherein causing storage of the data object in location specified by the storage location information comprises processing the data storage job based at least in part on the storage location information. 15. The system of claim 14 , wherein the processing the data storage job comprises scheduling the job for execution based at least in part on a batch processing schedule. 16. The system of claim 13 , wherein the data object identifier encodes at least data validation information usable to validate integrity of the data object. 17. The system of claim 16 , wherein the executable instructions, when executed by the one or more processors, further cause the one or more processors to collectively validate integrity of the data object based at least in part on the data validation information. 18. One or more non-transitory computer-readable storage media having collectively stored thereon executable instructions that, when executed by one or more processors of an archival data storage system, cause the syst
Physics · mapped topic
Physics · mapped topic
Concurrency control (transaction processing G06F9/466) · CPC title
Details of archiving (lifecycle management in storage systems G06F3/0649; point-in-time backing up or restoration of persistent data G06F11/1446) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.