Multi-controller declarative fault management and coordination for microservices

US11687399B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11687399-B2
Application numberUS-202117376419-A
CountryUS
Kind codeB2
Filing dateJul 15, 2021
Priority dateJul 15, 2021
Publication dateJun 27, 2023
Grant dateJun 27, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and computer program products for multi-controller declarative fault management and coordination for microservices are provided herein. A computer-implemented method includes processing information pertaining to at least one fault impacting multiple resources within a given system, wherein respective portions of the multiple resources are managed by multiple independent controllers; determining, by each of at least a portion of the multiple independent controllers and based at least in part on the processing of the information, one or more desired resource states and one or more remediation actions; generating, based at least in part on one or more of the determined desired resource states and the determined remediation actions, a sequential ordering of the determined remediation actions to be carried out by the at least a portion of the multiple controllers; and automatically initiating execution of the determined remediation actions in accordance with the generated sequential ordering.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: implementing, using at least a portion of multiple independent controllers, at least one registry comprising one or more policy rules associated with one or more actions to be carried out in connection with occurrence of one or more specified events; processing information pertaining to at least one fault impacting multiple resources within a given system, wherein respective portions of the multiple resources are managed by the multiple independent controllers, wherein processing information pertaining to at least one fault comprises invoking, upon detecting occurrence of at least one of the one or more specified events, one or more callback functions of at least a portion of the multiple independent controllers, and wherein invoking the one or more callback functions comprise invoking at least one representational state transfer application programming interface endpoint associated with the one or more callback functions over at least one hypertext transfer protocol connection; determining, by each of at least a portion of the multiple independent controllers and based at least in part on the processing of the information, one or more desired resource states and multiple remediation actions; generating, based at least in part on one or more of the determined desired resource states and the multiple determined remediation actions, a sequential ordering of the multiple determined remediation actions to be carried out by the at least a portion of the multiple controllers, wherein generating the sequential ordering of the multiple determined remediation actions is based at least in part on system topology information associated with the given system, stack information associated with the given system, and system configuration information associated with the given system, wherein system topology information comprises information pertaining to connections between microservices associated with the given system; and automatically initiating execution of the multiple determined remediation actions in accordance with the generated sequential ordering; wherein the method is carried out by at least one computing device. 2. The computer-implemented method of claim 1 , wherein the multiple resources comprise one or more of infrastructure resources, component microservices of at least one application, and one or more composite applications. 3. The computer-implemented method of claim 1 , further comprising: continuously monitoring the multiple resources and recording changes to state information due to one or more faults. 4. The computer-implemented method of claim 1 , wherein the multiple independent controllers comprise multiple operators in a Kubernetes system. 5. The computer-implemented method of claim 1 , wherein processing information pertaining to at least one fault comprises matching one or more system policies to at least a portion of the information pertaining to the at least one fault. 6. The computer-implemented method of claim 1 , wherein determining the one or more desired resource states comprises recording the one or more determined desired resource states using at least one fault localization module. 7. The computer-implemented method of claim 1 , wherein the sequential ordering of the multiple determined remediation actions comprises two or more concurrent remediation actions. 8. The computer-implemented method of claim 1 , wherein software implementing the method is provided as a service in a cloud environment. 9. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computing device to cause the computing device to: implement, using at least a portion of multiple independent controllers, at least one registry comprising one or more policy rules associated with one or more actions to be carried out in connection with occurrence of one or more specified events; process information pertaining to at least one fault impacting multiple resources within a given system, wherein respective portions of the multiple resources are managed by the multiple independent controllers, wherein processing information pertaining to at least one fault comprises invoking, upon detecting occurrence of at least one of the one or more specified events, one or more callback functions of at least a portion of the multiple independent controllers, and wherein invoking the one or more callback functions comprise invoking at least one representational state transfer application programming interface endpoint associated with the one or more callback functions over at least one hypertext transfer protocol connection; determine, by each of at least a portion of the multiple independent controllers and based at least in part on the processing of the information, one or more desired resource states and multiple remediation actions; generate, based at least in part on one or more of the determined desired resource states and the multiple determined remediation actions, a sequential ordering of the multiple determined remediation actions to be carried out by the at least a portion of the multiple controllers, wherein generating the sequential ordering of the multiple determined remediation actions is based at least in part on system topology information associated with the given system, stack information associated with the given system, and system configuration information associated with the given system, wherein system topology information comprises information pertaining to connections between microservices associated with the given system; and automatically initiate execution of the multiple determined remediation actions in accordance with the generated sequential ordering. 10. The computer program product of claim 9 , wherein the multiple resources comprise one or more of infrastructure resources, component microservices of at least one application, and one or more composite applications. 11. The computer program product of claim 9 , wherein the program instructions executable by a computing device further cause the computing device to: continuously monitor the multiple resources and recording changes to state information due to one or more faults. 12. The computer program product of claim 9 , wherein the multiple independent controllers comprise multiple operators in a Kubernetes system. 13. The computer program product of claim 9 , wherein processing information pertaining to at least one fault comprises matching one or more system policies to at least a portion of the information pertaining to the at least one fault. 14. The computer program product of claim 9 , wherein determining the one or more desired resource states comprises recording the one or more determined desired resource states using at least one fault localization module. 15. The computer program product of claim 9 , wherein the sequential ordering of the multiple determined remediation actions comprises two or more concurrent remediation actions. 16. A system comprising: a memory configured to store program instructions; and a processor operatively coupled to the memory to execute the program instructions to: implement, using at least a portion of multiple independent controllers, at least one registry comprising one or more policy rules associated with one or more actions to be carried out in connection with occurrence of one or more specified events; process information pertaining to at least one fault impacting multiple resources within a given system, wherein respective port

Assignees

Inventors

Classifications

  • Error or fault detection not based on redundancy (power supply failures G06F1/30; network fault management H04L41/06) · CPC title

  • the processing taking place on a specific hardware platform or in a specific software environment · CPC title

  • Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs (verification or detection of system hardware configuration G06F11/2247) · CPC title

  • Remedial or corrective actions (recovery from an exception in an instruction pipeline G06F9/3861; by retry G06F11/1402; for recovering from a failure of a protocol instance or entity H04L69/40) · CPC title

  • where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems (multiprogramming arrangements G06F9/46; allocation of resources G06F9/50) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11687399B2 cover?
Methods, systems, and computer program products for multi-controller declarative fault management and coordination for microservices are provided herein. A computer-implemented method includes processing information pertaining to at least one fault impacting multiple resources within a given system, wherein respective portions of the multiple resources are managed by multiple independent contro…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F11/0793. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 27 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).