Systems and Methods for Efficient Data Preprocessing of Machine Learning Workloads
US-2024403138-A1 · Dec 5, 2024 · US
US9672073B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9672073-B2 |
| Application number | US-201213491063-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 7, 2012 |
| Priority date | Jun 7, 2012 |
| Publication date | Jun 6, 2017 |
| Grant date | Jun 6, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Distributing work in a distributed computing environment that includes multiple nodes. An individual node can receive a work assignment, which can then be divided into a plurality of work units. A first work unit can then be distributed to a first worker node. At least a portion of the first work unit can be re-distributed to a second worker node in response to determining that the first worker node has experienced a failure condition with respect to the first work unit.
Opening claim text (preview).
What is claimed is: 1. A method of distributing work in a distributed computing environment that comprises a leader node, a leader aggregator at the leader node, a first worker node, a worker aggregator at the first worker node, a second worker node, and a worker aggregator at the second worker node, the method comprising: receiving a work assignment at the leader node; dividing, by the leader node, the work assignment into a plurality of work units; receiving a message from the first worker node indicating that the first worker node is capable of accepting a work unit; distributing, by the leader node, a first work unit of the plurality of work units to the first worker node in response to the received message; receiving, by the worker aggregator at the first worker node, first results upon completely processing the first work unit; requesting, by the first worker node, a second work unit of the plurality of work units, without sending the first results to the leader aggregator; receiving, by the worker aggregator at the first worker node, second results representing a processed portion of the second work unit, as the first worker node processes the second work unit; when the second work unit is completely processed, receiving, by the leader aggregator, the first and second results; when the second work unit is not completely processed, receiving a second message, by the leader node, from the first worker node indicating that the first worker node has experienced a re-distribution condition with respect to the second work unit, wherein the re-distribution condition comprises a detected failure to completely process the second work unit, wherein the second message from the first worker node contains the second results from the second work unit available when the first worker node experienced the re-distribution condition; re-distributing, by the leader node, an unprocessed portion of the second work unit to the second worker node for processing in response to receiving the second message; receiving, by the worker aggregator at the second worker node, third results upon completely processing the unfinished portion of the second work unit; and receiving, by the leader aggregator, the first, second, and third results. 2. The method of claim 1 , further comprising: distributing, by the leader node, a second work unit of the plurality of work units to the second worker node; and receiving, by the leader node, a result of the second work unit from the second worker node. 3. The method of claim 2 , wherein the re-distributing the at least a portion of the first work unit to the second worker node occurs following the receiving the result of the second work unit from the second worker node. 4. The method of claim 1 , wherein the re-distribution condition further comprises an elapsed tune from when the first work unit was distributed to the first worker node that exceeds a pre-determined threshold. 5. The method of claim 1 , wherein the step of distributing the first work unit to the first worker node comprises sending multiple work units to the first worker node. 6. The method of claim 1 , wherein the work assignment is a database query. 7. The method of claim 1 , wherein each work unit has associated with it work unit information comprising: a work unit identifier, an identifier for the last worker node to which it was assigned, a timestamp indicating the time at which the work unit was last assigned, and a link to one or more other pending work units assigned to the same worker node that have not been completed. 8. The method of claim 7 , wherein the step of distributing the first work unit includes updating the work unit information associated with that work unit. 9. A computer system configured to function in a distributed computing environment that comprises the computer system, a leader node, a leader aggregator at the leader node, a first worker node, a worker aggregator at the first worker node, a second worker node, and a worker aggregator at the second worker node, the computer system comprising: an interface configured to communicate with the first worker node and the second worker node; and one or more processors communicatively coupled to the interface and configured to: receive a work assignment; receive a message from the first worker node indicating that the first worker node is capable of accepting a work unit; divide the work assignment into a plurality of work units and to distribute a first work unit of the plurality of work units to the first worker node, in response to the received message; receive, by the worker aggregator at the first worker node, first results upon completely processing the first work unit; request, by the first worker node, a second work unit of the plurality of work units, without sending the first results to the leader aggregator; receive, by the worker aggregator at the first worker node, second results representing a processed portion of the second work unit, as the first worker node processes the second work unit; when the second work unit is completely processed, receiving, by the leader aggregator, the first and second results; when the second work unit is not completely processed, receive a second message, by the leader node, from the first worker node indicating that the first worker node has experienced a re-distribution condition with respect to the second work unit, wherein the re-distribution condition comprises a detected failure to completely process the second work unit, wherein the second message from the first worker node contains the second results from the second work unit available when the first worker node experienced the re-distribution condition; re-distribute, by the leader node, an unprocessed portion of the second work unit to the second worker node for processing in response to receiving the second message; receive, by the worker aggregator at the second worker node, third results upon completely processing the unfinished portion of the second work unit; and receive, by the leader aggregator, the first, second, and third results. 10. The computer system of claim 9 , wherein the one or more processors are further configured to distribute a second work unit of the plurality of work units to the second worker node, and wherein the computer system is configured to receive a result of the second work unit from the second worker node. 11. The computer system of claim 10 , wherein the one or more processors are further configured to re-distribute at least a portion of the first work unit to the second worker node in response to receiving the result of the second work unit from the second worker node. 12. The computer system of claim 9 , wherein the re-distribution condition further comprises an elapsed time from when the first work unit was distributed to the first worker node that exceeds a pre-determined threshold. 13. The computer system of claim 9 , wherein the one or more processors are further configured to distribute the first work unit to the first worker node by sending multiple work units to the first worker node. 14. The computer system of claim 9 , wherein the work assignment is a database query. 15. The computer system of claim 9 , wherein each work unit has associated with it work unit information comprising: a work unit identifier, an identifier for the last worker node to which it was assigned, a timestamp indicating the time at which the work unit was last assigned, and a link to one or more other pending work units assigned to the same worker node that have not been completed. 16. The computer s
Task decomposition · CPC title
Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs (mappping at compile time, see G06F8/451) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.