Dynamic application instance discovery and state management within a distributed system

US10652076B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10652076-B2
Application numberUS-201715831115-A
CountryUS
Kind codeB2
Filing dateDec 4, 2017
Priority dateDec 29, 2005
Publication dateMay 12, 2020
Grant dateMay 12, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Dynamic application instance discovery and state management within a distributed system. A distributed system may implement application instances configured to perform one or more application functions within the distributed system, and discovery and failure detection daemon (DFDD) instances, each configured to store an indication of a respective operational state of each member of a respective group of the number of application instances. Each of the DFDD instances may repeatedly execute a gossip-based synchronization protocol with another one of the DFDD instances, where execution of the protocol between DFDD instances includes reconciling differences among membership of the respective groups of application instances. A new application instance may be configured to notify a particular DFDD instance of its availability to perform an application function. The particular DFDD instance may be configured to propagate the new instance's availability to other DFDD instances via execution of the synchronization protocol, without intervention on the part of the new application instance.

First claim

Opening claim text (preview).

What is claimed is: 1. A distributed system, comprising: a plurality of computing devices configured to implement: a plurality of application instances configured to perform one or more functions; and a plurality of discovery and failure detection daemon (DFDD) instances, wherein the plurality of DFDD instances are configured to store state information for the plurality of application instances, wherein at least one of the DFDD instances is configured to update, based at least in part on information received from a respective application instance, the state information according to a state machine defining transitions between a plurality of states including a state indicating the respective application instance is newly online, a state indicating the respective application instance is operating normally, a state indicating the respective application instance has lost communication with a respective DFDD instance, a state indicating the respective application instance has failed, and a state indicating the respective application instance is subject to a network split; wherein at least one of the plurality of DFDD instances is configured to execute a peer-to-peer, gossip-based synchronization protocol with a peer instance selected from among the plurality of DFDD instances, and to execute the protocol, the peer DFDD instances are configured to exchange state information for at least one of the plurality of application instances. 2. The distributed system as recited in claim 1 , wherein the plurality of application instances includes two or more different types of application instances, each type of application instance configured to perform one or more different functions of the distributed system, and wherein the state information includes global state information common to all types of application instances and specific state information for at least one type of application instance. 3. The distributed system as recited in claim 1 , wherein the state information for a given application instance includes information indicating a physical or logical location of the application instance in the distributed system. 4. The distributed system as recited in claim 1 , wherein at least one of the plurality of application instances is configured to report its status to a DFDD instance at regular or irregular intervals, wherein the DFDD instance is configured to update global state information indicating whether the respective application instance is operating normally or is in an abnormal state according to the status reports of the application instance. 5. The distributed system as recited in claim 1 , wherein each DFDD instance is one of a daemon process configured to operate within an operating system environment or an autonomous hardware or software agent configured to operate independently from an operating system environment. 6. A method, comprising: storing, by a plurality of discovery and failure detection daemon (DFDD) instances implemented on a plurality of computing devices, state information for a plurality of application instances configured to perform one or more functions in a distributed system; updating, by one or more of the plurality of DFDD instances, and based at least in part on information received from a respective application instance, the state information according to a state machine defining transitions between a plurality of states including a state indicating the respective application instance is newly online, a state indicating the respective application instance is operating normally, a state indicating the respective application instance has lost communication with a respective DFDD instance, a state indicating the respective application instance has failed, and a state indicating the respective application instance is subject to a network split; selecting, by at least one of the plurality of DFDD instances, a peer DFDD instance of the plurality of DFDD instances; and communicating, by the at least one of the plurality of DFDD instances, state information for one or more of the plurality of application instances to the peer DFDD instance according to a peer-to-peer synchronization protocol. 7. The method as recited in claim 6 , further comprising iteratively performing the selecting and the communicating. 8. The method as recited in claim 6 , wherein the synchronization protocol is a gossip-based synchronization protocol. 9. The method as recited in claim 6 , wherein, in the communicating, the at least one of the plurality of DFDD instances exchanges state information for the one or more of the plurality of application instances with the other DFDD instance according to the synchronization protocol. 10. The method as recited in claim 6 , wherein the plurality of application instances includes two or more different types of application instances, each type of application instance configured to perform one or more different functions of the distributed system. 11. The method as recited in claim 10 , wherein the state information includes global state information common to all types of application instances. 12. The method as recited in claim 10 , wherein the state information includes specific state information for at least one type of application instance. 13. The method as recited in claim 6 , wherein the state information for a given application instance includes information for accessing the application instance by clients of the distributed system. 14. The method as recited in claim 6 , wherein the state information for a given application instance includes information indicating a physical or logical location of the application instance in the distributed system. 15. The method as recited in claim 6 , wherein the state information includes global state information for at least one application instance that indicates whether the respective application instance is operating normally or is in an abnormal state. 16. The method as recited in claim 15 , further comprising one or more of the application instances each periodically or aperiodically reporting its status to at least one of the plurality of DFDD instances, wherein the global state information for the respective application instance is updated according to the reported status. 17. The method as recited in claim 6 , wherein the state information for at least one application instance includes information indicating that the application instance is a new application instance to be added to the distributed system. 18. The method as recited in claim 6 , wherein the selecting and the communicating are performed among two or more of the plurality of DFDD instances that are configured to store state information for a respective group of the plurality of application instances. 19. A non-transitory computer-accessible storage medium storing instructions that when executed by a computer implement a discovery and failure detection daemon (DFDD) configured to: store state information for at least one of a plurality of application instances configured to perform one or more functions in a distributed system; update, based at least in part on information received from a respective application instance, the state information according to a state machine defining transitions between a plurality of states including a state indicating the respective application instance is newly online, a state indicating the respective application instance is operating normally, a state indicating the respective application instance has lost communication with a respective DFDD instance, a state i

Assignees

Inventors

Classifications

  • Distributed indices · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • De-duplication techniques · CPC title

  • G06F16/184Primary

    implemented as replicated file system · CPC title

  • in which an application is distributed across nodes in the network (software deployment G06F8/60; multiprogramming arrangements G06F9/46) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10652076B2 cover?
Dynamic application instance discovery and state management within a distributed system. A distributed system may implement application instances configured to perform one or more application functions within the distributed system, and discovery and failure detection daemon (DFDD) instances, each configured to store an indication of a respective operational state of each member of a respective…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/184. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 12 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).