Dynamically modifying a cluster of computing nodes used for distributed execution of a program

US9329909B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9329909-B1
Application numberUS-201213620805-A
CountryUS
Kind codeB1
Filing dateSep 15, 2012
Priority dateMar 31, 2009
Publication dateMay 3, 2016
Grant dateMay 3, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are described for managing distributed execution of programs. In some situations, the techniques include dynamically modifying the distributed program execution in various manners, such as based on monitored status information. The dynamic modifying of the distributed program execution may include adding and/or removing computing nodes from a cluster that is executing the program, modifying the amount of computing resources that are available for the distributed program execution, terminating or temporarily suspending execution of the program (e.g., if an insufficient quantity of computing nodes of the cluster are available to perform execution), etc.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving, by one or more configured computing systems of a program execution service, configuration information that indicates a program to be executed in a distributed manner; selecting, by the one or more configured computing systems, multiple computing nodes provided by the program execution service to use as a cluster for executing the indicated program; initiating, by the one or more configured computing systems, the executing of the indicated program in a distributed manner on the selected multiple computing nodes of the cluster such that one or more jobs of the indicated program are attempted to be executed on each of the selected multiple computing nodes; determining, by the one or more configured computing systems and while at least one of the selected multiple computing nodes is still executing at least one of the jobs of the indicated program, that an actual amount of computing resources being used by the multiple computing nodes to execute the indicated program differs from an expected amount of computing resources; and initiating, based at least in part on the determining, a change in a quantity of the multiple computing nodes of the cluster for use in continuing executing the indicated program. 2. The computer-implemented method of claim 1 wherein the initiating of the change in the quantity of the multiple computing nodes is further based on determining that a subset of the multiple computing nodes of the cluster have each completed one or more of the jobs of the indicated program, and wherein the initiating of the change in the quantity of the multiple computing nodes includes removing the computing nodes of the subset from the cluster. 3. The computer-implemented method of claim 1 wherein the received configuration information further includes an indication of a time by which the executing of the indicated program is expected to be completed, wherein the initiating of the change in the quantity of the multiple computing nodes is further based on determining that the selected multiple computing nodes of the cluster will be unable to complete the executing of the indicated program by the indicated time, and wherein the initiating of the change in the quantity of the multiple computing nodes includes adding one or more additional computing nodes to the cluster. 4. The computer-implemented method of claim 1 wherein the initiating of the change in the quantity of the multiple computing nodes includes initiating termination of the executing of the indicated program on the multiple computing nodes of the cluster without completing the executing of the indicated program. 5. The computer-implemented method of claim 4 further comprising scheduling an attempt to execute the terminated indicated program at a later time. 6. The computer-implemented method of claim 1 wherein the indicated program is a first program having a lower priority level than a priority level of a second program that is being executed concurrently with the first program, and wherein the initiating of the change in the quantity of the multiple computing nodes includes removing one or more of the multiple computing nodes from the cluster that is executing the first program in order to make the removed computing nodes available for use by a second cluster of computing nodes that is executing the second program. 7. The computer-implemented method of claim 1 wherein the actual amount of computing resources being used by the multiple computing nodes to execute the indicated program is lower than the expected amount of computing resources, and wherein the initiating of the change in the quantity of the multiple computing nodes includes adding one or more additional computing nodes to the cluster. 8. The computer-implemented method of claim 1 wherein the actual amount of computing resources being used by the multiple computing nodes to execute the indicated program is higher than the expected amount of computing resources, and wherein the initiating of the change in the quantity of the multiple computing nodes includes removing one or more of the multiple computing nodes from the cluster. 9. The computer-implemented method of claim 1 further comprising completing the executing of the jobs of the indicated program, at least some of the multiple jobs each using a subset of input data indicated by a user who supplied the received configuration information, and providing final results from the executing to the user. 10. The computer-implemented method of claim 1 wherein the indicated program is configured to perform one or more map functions on each of multiple input data subsets and to perform one or more reduce functions on results of the one or more map functions, and wherein the method further comprises generating multiple jobs on the multiple computing nodes to each implement at least one of the map functions or at least one of the reduce functions. 11. The computer-implemented method of claim 1 wherein the multiple computing nodes include multiple virtual machines hosted by one or more physical computing systems that are each able to execute at least one portion of a program. 12. A non-transitory computer-readable medium having stored contents that configure a computing device to: receive configuration information that indicates a program to be executed in a distributed manner; initiate executing of the indicated program in a distributed manner on a cluster of multiple computing nodes such that one or more jobs of the indicated program are attempted to be executed on each of the multiple computing nodes; determine, while at least one of the multiple computing nodes is still executing at least one of the jobs of the indicated program, that at least one of one or more defined criteria have been satisfied, wherein determining that the at least one defined criteria have been satisfied includes determining that an actual amount of computing resources being used by the multiple computing nodes for the executing of the indicated program differs from an expected amount of computing resources; and in response to the determining, initiate a change in a quantity of the multiple computing nodes of the cluster for executing the indicated program, wherein initiating the change in the quantity of the multiple computing nodes includes selecting one or more particular computing nodes to add to or remove from the cluster based at least in part on factors associated with the one or more particular computing nodes. 13. The non-transitory computer-readable medium of claim 12 wherein the received configuration information specifies a quantity of computing nodes to use for the executing of the indicated program. 14. The non-transitory computer-readable medium of claim 13 wherein the configured computing system is part of an execution service, and wherein the stored contents further configure the computing device to select the specified quantity of multiple computing nodes to be used as the cluster from a plurality of computing nodes provided by the execution service. 15. The non-transitory computer-readable medium of claim 13 wherein the selected multiple computing nodes of the cluster are of a quantity that is less than the specified quantity at a time when the initiating of the executing of the indicated program is performed, and wherein the stored contents further configure the computing device to add one or more additional computing nodes to the cluster during the executing of the indicated program such that at a later time after the initiating of the executing, the multiple computing nodes of the

Assignees

Inventors

Classifications

  • G06F9/5072Primary

    Grid computing · CPC title

  • G06F9/5083Primary

    Techniques for rebalancing the load in a distributed system · CPC title

  • using data related to the state of servers by a load balancer · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9329909B1 cover?
Techniques are described for managing distributed execution of programs. In some situations, the techniques include dynamically modifying the distributed program execution in various manners, such as based on monitored status information. The dynamic modifying of the distributed program execution may include adding and/or removing computing nodes from a cluster that is executing the program, mo…
Who is the assignee on this patent?
Khanna Richendra, Sirota Peter, Nowland Ian P, and 5 more
What technology area does this patent fall under?
Primary CPC classification G06F9/5072. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 03 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).