High availability architecture

US9329937B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9329937-B1
Application numberUS-201314145177-A
CountryUS
Kind codeB1
Filing dateDec 31, 2013
Priority dateDec 31, 2013
Publication dateMay 3, 2016
Grant dateMay 3, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system for executing tasks in a computing resource environment is disclosed. Variations of a system may include two or more scheduler partitions associated with respective schedulers, scheduler state information, and respective plurality of computing resources. Variations of a system may include a task distributor that distributes tasks to the scheduler partitions. In some variations, one scheduler is configured such that, responsive to the scheduler partition receiving a task from the distributor, that scheduler allocates a computing resource for execution of that task and updates its scheduler state information accordingly. In some variations, the task distributor is configured such that, if one scheduler is in a failed or corrupted state, the task distributor stops distributing tasks to that scheduler partition and prevents that scheduler state information from propagation to, or access by, other scheduler partitions.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system for executing tasks in a computing resource environment, the system comprising: a first scheduler partition, the first scheduler partition being associated with a first scheduler running on one or more processors of the system, first scheduler state information, and a first plurality of computing resources; a second scheduler partition, the second scheduler partition being associated with a second scheduler running on the one or more processors, second scheduler state information, and a second plurality of computing resources; a task distributor running on the one or more processors and configured to distribute tasks to the first and second scheduler partitions; the first scheduler being configured such that, responsive to the first scheduler partition receiving a task from the task distributor, the first scheduler allocates a computing resource for execution of the received task and updates the first scheduler state information accordingly; and the task distributor being configured such that, responsive to a determination of the first scheduler state information indicating that the first scheduler is in a failed or corrupted state, the task distributor stops distributing tasks to the first scheduler partition and prevents the first scheduler state information from propagation to, or access by, the second scheduler partition when the second scheduler is not in a failed or corrupted state by isolating the first scheduler and sequestering its state information from the second scheduler, the first scheduler state information subsequently processed by the task distributor. 2. The system of claim 1 , the task distributor being configured to: identify a first task as a fault-tolerant task meant to tolerate scheduler failures; distribute the identified first task to the first scheduler partition; and distribute a first replicated task to the second scheduler partition, the first replicated task being a replica of the identified first task. 3. The system of claim 1 , the system further comprising a source of task routing information; the distributor being configured to distribute tasks to the scheduler partitions based on the task routing information. 4. The system of claim 3 , the distributor being configured to update the task routing information based on the scheduler state information associated with the respective scheduler partitions; and in response to determining that the first scheduler state information indicates that the first scheduler is in a failed or corrupted state, the distributor updates task routing information to indicate that the first scheduler partition may not receive distributed tasks from the distributor. 5. The system of claim 3 , the task routing information including a distribution key associated with each task to be distributed by the distributor; and the distributor being configured to distribute tasks based on a distribution sequence indicated by the distribution key of each task. 6. The system of claim 1 , where the first plurality of computing resources includes a plurality of machine sets, each machine set representing computing resources associated with or based on at least one hardware component of a computing device. 7. The system of claim 1 , the distributor being configured to distribute tasks to the first and second scheduler partitions based on a respective resource utilization level in the first and second scheduler partitions, the resource utilization level of a scheduler partition being indicated based on the scheduler state information associated with that scheduler partition. 8. The system of claim 1 , where the first scheduler state information includes information about resource outages occurring in the first plurality of computing resources. 9. The system of claim 8 , the distributor being configured to distribute tasks to the scheduler partitions based on a respective computing capacity of each scheduler partition, the computing capacity of a scheduler partition being determined based on the scheduler state information. 10. The system of claim 9 , the scheduler state information including information about a total number of tasks being executed within the scheduler partition and a total computing resource usage level associated with the tasks being executed from among the first plurality of computing resources. 11. The system of claim 1 , the system further comprising: a third scheduler partition, the third scheduler partition being associated with a third scheduler, third scheduler state information, and a third plurality of computing resources; and in response to determining that the first scheduler state information indicates that the first scheduler is in a failed or corrupted state, the distributor stops distributing tasks to the first scheduler partition and prevents the first scheduler state information from being propagated to or accessed by the second or third scheduler partitions. 12. The system of claim 11 , where each scheduler partition is configured to be internally fault tolerant such that an outage of a computing resource allocated for execution of a task within a scheduler partition causes the scheduler to allocate a new computing resource for execution of that task; and where each scheduler partition is configured to be isolated from the other scheduler partitions such that scheduler state information, tasks, and resources associated with a scheduler in a corrupted or failed state are prevented from being propagated to the other scheduler partitions; and where remaining instances tasks replicated across two or more scheduler partitions are updated in response to a scheduler of one of said two or more scheduler partitions being in a failed or corrupted state. 13. The system of claim 1 , the first scheduler partition being uniquely associated with the first scheduler; and the second scheduler partition being uniquely associated with the second scheduler. 14. A method of preventing cascading scheduler failures in a system for executing tasks in a computing resource environment, the method comprising: distributing a first task from a task distributor to one of at least a first scheduler partition and a second scheduler partition in the computing resource environment the second scheduler partition being associated with a second scheduler and a second plurality of computing resources; in response to receiving the first task from the task distributor, allocating, with a first scheduler of the first scheduler partition, a computing resource for execution of the received first task from among a first plurality of computing resources associated with the first scheduler partition; in response to said allocating, updating, with the first scheduler, first scheduler state information, the updated first scheduler state information being indicative of a state of the first scheduler, the received first task, and the allocated computing resource after said allocating; and in response to determining that the first scheduler state information indicates that the first scheduler is in a failed or corrupted state, stopping distribution of tasks by the distributor to the first scheduler partition, and preventing the first scheduler state information from being propagated to or accessed by the second scheduler partition when the second scheduler is not in a failed or corrupted state by isolating the first scheduler and sequestering its state information from the second scheduler, the first scheduler state information subsequently processed by the task distributor. 15. The method of claim 14 , distributing a first task fro

Assignees

Inventors

Classifications

  • G06F9/4881Primary

    Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

  • to service a request · CPC title

  • Task transfer initiation or dispatching · CPC title

  • Partitioning or combining of resources · CPC title

  • G06F11/14Primary

    Error detection or correction of the data by redundancy in operations (error detection or correction of the data by redundancy in hardware G06F11/16) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9329937B1 cover?
A system for executing tasks in a computing resource environment is disclosed. Variations of a system may include two or more scheduler partitions associated with respective schedulers, scheduler state information, and respective plurality of computing resources. Variations of a system may include a task distributor that distributes tasks to the scheduler partitions. In some variations, one sch…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G06F9/4881. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 03 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).