Equitable resource allocation for storage object deletion
US-9417917-B1 · Aug 16, 2016 · US
US2016011901A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016011901-A1 |
| Application number | US-201414327338-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jul 9, 2014 |
| Priority date | Jul 9, 2014 |
| Publication date | Jan 14, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure relates to dynamically adjusting shard allocation during parallel processing operations. One example method includes determining a target completion time for a batch data processing job of an input data set performed by a plurality of tasks, each of the plurality of tasks processing a different input shard including a different portion of the input data set; identifying a first task having an estimated completion time greater than the target completion time of the batch data processing job; and splitting the first input shard into a first split input shard and a second split input shard different from the first split input shard, the first split input shard including a first portion of the first input shard, and the second split input shard including a second portion of the first input shard different from the first portion.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method executed by one or more processors, the method comprising: determining a target completion time for a batch data processing job of an input data set, the batch data processing job performed by a plurality of tasks, each of the plurality of tasks processing a different input shard that includes a different portion of the input data set; identifying, from the plurality of tasks, a first task having an estimated completion time greater than the target completion time of the batch data processing job, the estimated completion time representing an estimated time at which the first task will complete processing the portion of the input data set at a first input shard being processed by the first task; and splitting the first input shard into a first split input shard and a second split input shard different from the first split input shard, the first split input shard including a first portion of the first input shard, and the second split input shard including a second portion of the first input shard different from the first portion. 2 . The method of claim 1 , further comprising: assigning the first split input shard to be processed by a second task; and inserting the second split input shard into a pool of unassigned shard to await processing by an available task. 3 . The method of claim 2 , wherein the second task is the first task and the third task is an idle task associated with the batch data processing job. 4 . The method of claim 1 , further comprising: identifying that the batch data processing job has entered an endgame state where all shards associated with the input data set are being processed by a task and there is at least one idle task associated with the batch processing job, wherein identifying the first task and splitting the first input shard are performed after identifying that the batch data processing job has entered the endgame state. 5 . The method of claim 1 , wherein each of the plurality of tasks includes an estimated completion time representing an estimated time at which the task will complete processing its input shard. 6 . The method of claim 5 , wherein the target completion time is determined based at least in part on the estimated completion times for the plurality of tasks for the batch data processing job. 7 . The method of claim 5 , further comprising determining the estimated completion time for each of the plurality of tasks based at least in part on at least one of: a size of a portion of the input shard remaining to be processed for each of the plurality of tasks, a processing rate associated with each of the plurality of tasks for a portion of the input shard already processed for each of the plurality of tasks, or a recent processing rate associated with each of the plurality of tasks for a portion of the input shard processed within a time window for each of the plurality of tasks. 8 . The method of claim 1 , wherein the target completion time is determined based on at least one of: historical information associated with the batch data processing job, or historical information associated with other batch data processing jobs. 9 . The method of claim 1 , wherein the target completion time is a deadline specified by an administrator of the batch data processing job for finishing the batch data processing job. 10 . The method of claim 1 , wherein identifying the first task and splitting the first input shard are performed by the first task itself. 11 . The method of claim 10 , wherein identifying the first task includes: receiving, by the first task from a supervisor process associated with the batch data processing job, the target completion time; and comparing, by the first task, the target completion time to the estimated completion time to determine that the first task will not complete by the target completion time. 12 . The method of claim 11 , wherein splitting the first task includes: sending, by the first task, an indication of a portion of the first input shard that the first task will not be able to process by the target completion time to the supervisor process; and assigning, by the supervisor process, the portion of the first input shard to an idle task for processing. 13 . The method of claim 1 , wherein determining a target completion time for a batch data processing job includes computing an average of the estimated completion times of the plurality of tasks and of estimated completion times for one or more idle tasks associated with the batch data processing job, wherein the estimated completion times for the one or more idle tasks have a value of 0. 14 . A non-transitory, computer-readable medium storing instructions operable when executed to cause at least one processor to perform operations comprising: determining a target completion time for a batch data processing job of an input data set, the batch data processing job performed by a plurality of tasks, each of the plurality of tasks processing a different input shard that includes a different portion of the input data set; identifying, from the plurality of tasks, a first task having an estimated completion time greater than the target completion time of the batch data processing job, the estimated completion time representing an estimated time at which the first task will complete processing the portion of the input data set at a first input shard being processed by the first task; and splitting the first input shard into a first split input shard and a second split input shard different from the first split input shard, the first split input shard including a first portion of the first input shard, and the second split input shard including a second portion of the first input shard different from the first portion. 15 . The computer-readable medium of claim 14 , further comprising: assigning the first split input shard to be processed by a second task; and inserting the second split input shard into a pool of unassigned shard to await processing by an available task. 16 . The computer-readable medium of claim 15 , wherein the second task is the first task and the third task is an idle task associated with the batch data processing job. 17 . The computer-readable medium of claim 14 , the operations further comprising: identifying that the batch data processing job has entered an endgame state where all shards associated with the input data set are being processed by a task and there is at least one idle task associated with the batch processing job, wherein identifying the first task and splitting the first input shard are performed after identifying that the batch data processing job has entered the endgame state. 18 . The computer-readable medium of claim 14 , wherein each of the plurality of tasks includes an estimated completion time representing an estimated time at which the task will complete processing its input shard. 19 . The computer-readable medium of claim 18 , wherein the target completion time is determined based at least in part on the estimated completion times for the plurality of tasks for the batch data processing job. 20 . A system comprising: memory for storing data; and one or more processors operable to perform operations comprising: determining a target completion time for a batch data processing job of an input data set, the batch data processing job performed by a plurality of tasks, each of the plurality of tasks processing a different input shard that includes a differ
by program, e.g. task dispatcher, supervisor, operating system · CPC title
Multiprogramming arrangements · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.