Dynamic application instance discovery and state management within a distributed system

US9838240B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9838240-B1
Application numberUS-201314082728-A
CountryUS
Kind codeB1
Filing dateNov 18, 2013
Priority dateDec 29, 2005
Publication dateDec 5, 2017
Grant dateDec 5, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Dynamic application instance discovery and state management within a distributed system. A distributed system may implement application instances configured to perform one or more application functions within the distributed system, and discovery and failure detection daemon (DFDD) instances, each configured to store an indication of a respective operational state of each member of a respective group of the number of application instances. Each of the DFDD instances may repeatedly execute a gossip-based synchronization protocol with another one of the DFDD instances, where execution of the protocol between DFDD instances includes reconciling differences among membership of the respective groups of application instances. A new application instance may be configured to notify a particular DFDD instance of its availability to perform an application function. The particular DFDD instance may be configured to propagate the new instance's availability to other DFDD instances via execution of the synchronization protocol, without intervention on the part of the new application instance.

First claim

Opening claim text (preview).

What is claimed is: 1. A distributed system, comprising: a plurality of computing devices configured to implement: a plurality of application instances configured to perform functions of the distributed system, wherein the plurality of application instances includes two or more different types of application instances, each type of application instance configured to perform one or more different functions of the distributed system; and a plurality of discovery and failure detection daemon (DFDD) instances, wherein the plurality of DFDD instances are configured to store operational state information for the plurality of application instances, wherein the state information includes global state information common to all types of application instances and specific state information specific to at least one type of application instance and wherein at least one of the DFDD instances is configured to update the global state information according to a global state machine defining transitions between a plurality of global states including a state indicating the respective application instance is newly online, a state indicating the respective application instance is operating normally, a state indicating the respective application instance has lost communication with a respective DFDD instance, a state indicating the respective application instance has failed, and a state indicating the respective application instance is subject to a network split, according to one or more status reports received from the respective application instance; wherein at least one of the plurality of DFDD instances is configured to repeatedly execute a peer-to-peer, gossip-based synchronization protocol with a peer instance of the DFDD instances, wherein the peer instance is randomly or pseudorandomly selected from among the plurality of DFDD instances, and wherein to execute the protocol, the peer DFDD instances are configured to exchange state information for at least one of the plurality of application instances including both the global state information and the specific state information. 2. The distributed system as recited in claim 1 , wherein the state information for a given application instance includes information indicating a physical location of the application instance in the distributed system. 3. The distributed system as recited in claim 1 , wherein at least a respective application instance of the plurality of application instances is configured to report its status to a DFDD instance at regular or irregular intervals, wherein the DFDD instance is configured to update global state information according to status reports of the application instance. 4. The distributed system as recited in claim 1 , wherein each DFDD instance is one of a daemon process configured to operate within an operating system environment or an autonomous hardware or software agent configured to operate independently from an operating system environment. 5. A method, comprising: storing, by a plurality of discovery and failure detection daemon (DFDD) instances implemented on a plurality of computing devices, state information for a plurality of application instances configured to perform functions of a distributed system, wherein the plurality of application instances includes two or more different types of application instances, each type of application instance configured to perform one or more different functions of the distributed system, wherein the state information includes global state information common to all types of application instances and specific state information specific to at least one type of application and wherein the state information includes global state information according to a global state machine defining transition between a plurality of global states including a state indicating a respective application instance is newly online, a state indicating the respective application instance is operating normally, a state indicating the respective application instance has lost communication with a respective DFDD instance, a state indicating the respective application instance has failed, and a state indicating the respective application instance is subject to a network split, according to status reports of the application instance; randomly or pseudorandomly selecting, by at least one of the plurality of DFDD instances, a peer instance of the plurality of DFDD instances; and communicating, by the at least one of the plurality of DFDD instances, state information for one or more of the plurality of application instances to the other DFDD instance according to a peer-to-peer synchronization protocol, the state information including both the global state information and the specific state information. 6. The method as recited in claim 5 , further comprising iteratively performing the selecting and the communicating. 7. The method as recited in claim 5 , wherein the synchronization protocol is a gossip-based synchronization protocol. 8. The method as recited in claim 5 , wherein, in the communicating, the at least one of the plurality of DFDD instances exchanges state information for the one or more of the plurality of application instances with the other DFDD instance according to the synchronization protocol. 9. The method as recited in claim 5 , wherein the state information for a given application instance includes information for accessing the application instance by clients, the information including an Internet Protocol (IP) address and port number through which a client can establish a connection with the application instance of the distributed system. 10. The method as recited in claim 5 , wherein the state information for a given application instance includes information indicating a physical location of the application instance in the distributed system. 11. The method as recited in claim 5 , further comprising one or more of the application instances each periodically or aperiodically reporting its status to at least one of the plurality of DFDD instances, wherein the global state information for the respective application instance is updated according to the reported status. 12. The method as recited in claim 5 , wherein the selecting and the communicating are performed among two or more of the plurality of DFDD instances that are configured to store state information for a respective group of the plurality of application instances. 13. A non-transitory computer-accessible storage medium storing instructions that when executed by a computer implement a discovery and failure detection daemon (DFDD) configured to: store operational state information for at least one of a plurality of application instances configured to perform functions of a distributed system, wherein the plurality of application instances includes two or more different types of application instances, each type of application instance configured to perform one or more different functions of the distributed system, wherein the state information includes global state information common to all types of application instances and specific state information specific to at least one type of application and wherein the state information includes global state information according to a global state machine defining transition between a plurality of global states including a state indicating a respective application instance is newly online, a state indicating the respective application instance is operating normally, a state indicating the respective application instance has lost communication with a respective DFDD instance, a state indicating the respective application instance has failed, and a state indicating the resp

Assignees

Inventors

Classifications

  • Electricity · mapped topic

  • G06F3/0667Primary

    at data level, e.g. file, record or object virtualisation · CPC title

  • G06F16/184Primary

    implemented as replicated file system · CPC title

  • in which an application is distributed across nodes in the network (software deployment G06F8/60; multiprogramming arrangements G06F9/46) · CPC title

  • Distributed indices · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9838240B1 cover?
Dynamic application instance discovery and state management within a distributed system. A distributed system may implement application instances configured to perform one or more application functions within the distributed system, and discovery and failure detection daemon (DFDD) instances, each configured to store an indication of a respective operational state of each member of a respective…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification H04L29/08135. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Dec 05 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).