Who is the assignee on this patent?

Deschler Kurt Wilhelm, Mittal Kaushal, Johnson Curtis Grant, and 3 more

What technology area does this patent fall under?

Primary CPC classification G06F9/5066. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 06 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Non-periodic check-pointing for fine granular retry of work in a distributed computing environment

US9672073B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9672073-B2
Application number	US-201213491063-A
Country	US
Kind code	B2
Filing date	Jun 7, 2012
Priority date	Jun 7, 2012
Publication date	Jun 6, 2017
Grant date	Jun 6, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Distributing work in a distributed computing environment that includes multiple nodes. An individual node can receive a work assignment, which can then be divided into a plurality of work units. A first work unit can then be distributed to a first worker node. At least a portion of the first work unit can be re-distributed to a second worker node in response to determining that the first worker node has experienced a failure condition with respect to the first work unit.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of distributing work in a distributed computing environment that comprises a leader node, a leader aggregator at the leader node, a first worker node, a worker aggregator at the first worker node, a second worker node, and a worker aggregator at the second worker node, the method comprising: receiving a work assignment at the leader node; dividing, by the leader node, the work assignment into a plurality of work units; receiving a message from the first worker node indicating that the first worker node is capable of accepting a work unit; distributing, by the leader node, a first work unit of the plurality of work units to the first worker node in response to the received message; receiving, by the worker aggregator at the first worker node, first results upon completely processing the first work unit; requesting, by the first worker node, a second work unit of the plurality of work units, without sending the first results to the leader aggregator; receiving, by the worker aggregator at the first worker node, second results representing a processed portion of the second work unit, as the first worker node processes the second work unit; when the second work unit is completely processed, receiving, by the leader aggregator, the first and second results; when the second work unit is not completely processed, receiving a second message, by the leader node, from the first worker node indicating that the first worker node has experienced a re-distribution condition with respect to the second work unit, wherein the re-distribution condition comprises a detected failure to completely process the second work unit, wherein the second message from the first worker node contains the second results from the second work unit available when the first worker node experienced the re-distribution condition; re-distributing, by the leader node, an unprocessed portion of the second work unit to the second worker node for processing in response to receiving the second message; receiving, by the worker aggregator at the second worker node, third results upon completely processing the unfinished portion of the second work unit; and receiving, by the leader aggregator, the first, second, and third results. 2. The method of claim 1 , further comprising: distributing, by the leader node, a second work unit of the plurality of work units to the second worker node; and receiving, by the leader node, a result of the second work unit from the second worker node. 3. The method of claim 2 , wherein the re-distributing the at least a portion of the first work unit to the second worker node occurs following the receiving the result of the second work unit from the second worker node. 4. The method of claim 1 , wherein the re-distribution condition further comprises an elapsed tune from when the first work unit was distributed to the first worker node that exceeds a pre-determined threshold. 5. The method of claim 1 , wherein the step of distributing the first work unit to the first worker node comprises sending multiple work units to the first worker node. 6. The method of claim 1 , wherein the work assignment is a database query. 7. The method of claim 1 , wherein each work unit has associated with it work unit information comprising: a work unit identifier, an identifier for the last worker node to which it was assigned, a timestamp indicating the time at which the work unit was last assigned, and a link to one or more other pending work units assigned to the same worker node that have not been completed. 8. The method of claim 7 , wherein the step of distributing the first work unit includes updating the work unit information associated with that work unit. 9. A computer system configured to function in a distributed computing environment that comprises the computer system, a leader node, a leader aggregator at the leader node, a first worker node, a worker aggregator at the first worker node, a second worker node, and a worker aggregator at the second worker node, the computer system comprising: an interface configured to communicate with the first worker node and the second worker node; and one or more processors communicatively coupled to the interface and configured to: receive a work assignment; receive a message from the first worker node indicating that the first worker node is capable of accepting a work unit; divide the work assignment into a plurality of work units and to distribute a first work unit of the plurality of work units to the first worker node, in response to the received message; receive, by the worker aggregator at the first worker node, first results upon completely processing the first work unit; request, by the first worker node, a second work unit of the plurality of work units, without sending the first results to the leader aggregator; receive, by the worker aggregator at the first worker node, second results representing a processed portion of the second work unit, as the first worker node processes the second work unit; when the second work unit is completely processed, receiving, by the leader aggregator, the first and second results; when the second work unit is not completely processed, receive a second message, by the leader node, from the first worker node indicating that the first worker node has experienced a re-distribution condition with respect to the second work unit, wherein the re-distribution condition comprises a detected failure to completely process the second work unit, wherein the second message from the first worker node contains the second results from the second work unit available when the first worker node experienced the re-distribution condition; re-distribute, by the leader node, an unprocessed portion of the second work unit to the second worker node for processing in response to receiving the second message; receive, by the worker aggregator at the second worker node, third results upon completely processing the unfinished portion of the second work unit; and receive, by the leader aggregator, the first, second, and third results. 10. The computer system of claim 9 , wherein the one or more processors are further configured to distribute a second work unit of the plurality of work units to the second worker node, and wherein the computer system is configured to receive a result of the second work unit from the second worker node. 11. The computer system of claim 10 , wherein the one or more processors are further configured to re-distribute at least a portion of the first work unit to the second worker node in response to receiving the result of the second work unit from the second worker node. 12. The computer system of claim 9 , wherein the re-distribution condition further comprises an elapsed time from when the first work unit was distributed to the first worker node that exceeds a pre-determined threshold. 13. The computer system of claim 9 , wherein the one or more processors are further configured to distribute the first work unit to the first worker node by sending multiple work units to the first worker node. 14. The computer system of claim 9 , wherein the work assignment is a database query. 15. The computer system of claim 9 , wherein each work unit has associated with it work unit information comprising: a work unit identifier, an identifier for the last worker node to which it was assigned, a timestamp indicating the time at which the work unit was last assigned, and a link to one or more other pending work units assigned to the same worker node that have not been completed. 16. The computer s

Assignees

Inventors

Classifications

G06F2209/5017
Task decomposition · CPC title
G06F9/5066Primary
Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs (mappping at compile time, see G06F8/451) · CPC title

Patent family

Related publications grouped by family.

View patent family 49716355

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9672073B2 cover?: Distributing work in a distributed computing environment that includes multiple nodes. An individual node can receive a work assignment, which can then be divided into a plurality of work units. A first work unit can then be distributed to a first worker node. At least a portion of the first work unit can be re-distributed to a second worker node in response to determining that the first worker…
Who is the assignee on this patent?: Deschler Kurt Wilhelm, Mittal Kaushal, Johnson Curtis Grant, and 3 more
What technology area does this patent fall under?: Primary CPC classification G06F9/5066. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 06 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).