Distributed batch processing of non-uniform data objects

US11010103B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11010103-B2
Application numberUS-201916447443-A
CountryUS
Kind codeB2
Filing dateJun 20, 2019
Priority dateJun 20, 2019
Publication dateMay 18, 2021
Grant dateMay 18, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The described methods, systems, and other aspects can advantageously provide balanced multi-stage processing of non-uniform object data. An example method may receive a list of buckets. Each of the buckets in the list of buckets can store one or more restorable objects. The method further comprises distributing the list of buckets to the two or more second nodes; determining a number of the one or more restorable objects in each bucket; determining a size of the one or more restorable objects in each bucket; generating batches of to-be-restored data objects based on the determined number of the one or more restorable objects in each bucket and the determined size of the one or more restorable objects in each bucket; and distributing the batches among the two or more second nodes for storage-related task processing.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a cluster of nodes that includes a first node and two or more second nodes; and a distribution component in the first node that is configured to: receive a list of a plurality of buckets, wherein each bucket in the plurality of buckets stores one or more restorable data objects in a first set of storage nodes; determine a number of the one or more restorable data objects in each bucket; determine a size of the one or more restorable data objects in each bucket; generate batches of to-be-restored data objects from the plurality of buckets, wherein the distribution component is configured to generate each batch by: selecting a number of restorable data objects for a batch from the plurality of buckets; determining a total volume of the number of restorable data objects; and determining that the total volume satisfies a batch threshold; and distribute the batches among the two or more second nodes for restoring the batches of to-be-restored data objects from the first set of storage nodes. 2. The system of claim 1 , wherein the batches are uniform in size. 3. The system of claim 1 , wherein generating the batches includes: selecting an initial number of one or more data objects from a first bucket in the plurality of buckets; determining the total volume of the initial number of one or more data objects; determining whether the total volume satisfies the batch threshold; responsive to the total volume dissatisfying the batch threshold, adding additional data objects from the plurality of buckets to the initial number of one or more data objects until the total volume is increased to satisfy the batch threshold; and responsive to the total volume satisfying the batch threshold, generating a first batch for the one or more data objects. 4. The system of claim 3 , wherein the distribution component is further configured to determine, based on available bandwidths of a processing component of each of the two or more second nodes, the batch threshold. 5. The system of claim 3 , wherein generating the batches further includes: determining that the total volume dissatisfies a second threshold; and responsive to the total volume dissatisfying the second threshold, removing data objects from the initial number of one or more data objects from the first bucket until the total volume is decreased to satisfy the second threshold. 6. The system of claim 1 , wherein: the list of the plurality of buckets includes a plurality of tuples associating local buckets and remote buckets; the remote buckets are configured to store one or more restorable data objects in the first set of storage nodes; the local buckets are configured to receive one or more restored data objects in a second set of storage nodes; and the two or more second nodes are configured to restore the batches of to-be-restored data objects from the remote buckets to the local buckets. 7. The system of claim 1 , wherein the distribution component is further configured to store the generated batches in one of a temporary data store and a persistent data store. 8. The system of claim 1 , wherein distributing the batches includes: determining a number of processing components associated with the two or more second nodes and a processing factor; and selecting, from the batches, a number of batches for distribution among the processing components of the two or more second nodes, wherein the number of batches is determined based on the number of the processing components and the processing factor. 9. The system of claim 8 , wherein the processing factor is greater than or equal to 1. 10. The system of claim 1 , wherein the distribution component is further configured to: determine partitions, wherein each partition includes at least one bucket from the plurality of buckets; distribute the partitions among the two or more second nodes for acquiring: the number of the one or more restorable data objects in each bucket; and the size of the one or more restorable data objects in each bucket; and receive, from the two or more second nodes: the number of the one or more restorable data objects in each bucket; and the size of the one or more restorable data objects in each bucket. 11. A computer-implemented method, comprising: receiving a list of a plurality of buckets, wherein each bucket in the plurality of buckets stores one or more restorable data objects in a first set of storage nodes; determining a number of the one or more restorable data objects in each bucket; determining a size of the one or more restorable data objects in each bucket; generating batches of to-be-restored data objects from the plurality of buckets by: selecting a number of restorable data objects for a batch from the plurality of buckets; determining a total volume of the number of restorable data objects; and determining that the total volume satisfies a batch threshold; and distributing the batches among two or more nodes for restoring the batches of to-be-restored data objects from the first set of storage nodes. 12. The computer-implemented method of claim 11 , wherein the batches are uniform in size. 13. The computer-implemented method of claim 11 , wherein generating the batches includes: selecting an initial number of one or more data objects from a first bucket in the plurality of buckets; determining the total volume of the initial number of one or more data objects; determining whether the total volume satisfies the batch threshold; responsive to the total volume dissatisfying the batch threshold, adding additional data objects from the plurality of buckets to the initial number of one or more data objects until the total volume is increased to satisfy the batch threshold; and responsive to the total volume satisfying the batch threshold, generating a batch for the one or more data objects. 14. The computer-implemented method of claim 13 , further comprising: determining, based on available bandwidths of a processing component of each of the two or more nodes, the batch threshold. 15. The computer-implemented method of claim 13 , wherein generating the batches further includes: determining that the total volume dissatisfies a second threshold; and responsive to the total volume dissatisfying the second threshold, removing data objects from the initial number of one or more data objects from the first bucket until the total volume is decreased to satisfy the second threshold. 16. The computer-implemented method of claim 11 , further comprising: storing the generated batches in one of a temporary data store and a persistent data store. 17. The computer-implemented method of claim 11 , wherein distributing the batches includes: determining a number of processing components associated with the two or more nodes and a processing factor; and selecting, from the batches, a number of batches for distribution among the processing components of the two or more nodes, wherein the number of batches is determined based on the number of the processing components and the processing factor. 18. The computer-implemented method of claim 17 , wherein the processing factor is greater than or equal to 1. 19. The computer-implemented method of claim 11 , further comprising: determining partitions, wherein each partition includes at least one bucket from the plurality of buckets; distributing the partitions among the two or more nodes for acquiring: the number of the one or more restorable data objects in e

Assignees

Inventors

Classifications

  • using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements · CPC title

  • Error detection or correction of the data by redundancy in hardware · CPC title

  • Redundant storage or storage space (G06F11/2056 takes precedence) · CPC title

  • Management of space entities, e.g. partitions, extents, pools · CPC title

  • G06F3/067Primary

    Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11010103B2 cover?
The described methods, systems, and other aspects can advantageously provide balanced multi-stage processing of non-uniform object data. An example method may receive a list of buckets. Each of the buckets in the list of buckets can store one or more restorable objects. The method further comprises distributing the list of buckets to the two or more second nodes; determining a number of the one…
Who is the assignee on this patent?
Western Digital Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F11/2094. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 18 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).