Managed remediation of non-compliant resources
US-11196627-B1 · Dec 7, 2021 · US
US11687399B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11687399-B2 |
| Application number | US-202117376419-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 15, 2021 |
| Priority date | Jul 15, 2021 |
| Publication date | Jun 27, 2023 |
| Grant date | Jun 27, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and computer program products for multi-controller declarative fault management and coordination for microservices are provided herein. A computer-implemented method includes processing information pertaining to at least one fault impacting multiple resources within a given system, wherein respective portions of the multiple resources are managed by multiple independent controllers; determining, by each of at least a portion of the multiple independent controllers and based at least in part on the processing of the information, one or more desired resource states and one or more remediation actions; generating, based at least in part on one or more of the determined desired resource states and the determined remediation actions, a sequential ordering of the determined remediation actions to be carried out by the at least a portion of the multiple controllers; and automatically initiating execution of the determined remediation actions in accordance with the generated sequential ordering.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: implementing, using at least a portion of multiple independent controllers, at least one registry comprising one or more policy rules associated with one or more actions to be carried out in connection with occurrence of one or more specified events; processing information pertaining to at least one fault impacting multiple resources within a given system, wherein respective portions of the multiple resources are managed by the multiple independent controllers, wherein processing information pertaining to at least one fault comprises invoking, upon detecting occurrence of at least one of the one or more specified events, one or more callback functions of at least a portion of the multiple independent controllers, and wherein invoking the one or more callback functions comprise invoking at least one representational state transfer application programming interface endpoint associated with the one or more callback functions over at least one hypertext transfer protocol connection; determining, by each of at least a portion of the multiple independent controllers and based at least in part on the processing of the information, one or more desired resource states and multiple remediation actions; generating, based at least in part on one or more of the determined desired resource states and the multiple determined remediation actions, a sequential ordering of the multiple determined remediation actions to be carried out by the at least a portion of the multiple controllers, wherein generating the sequential ordering of the multiple determined remediation actions is based at least in part on system topology information associated with the given system, stack information associated with the given system, and system configuration information associated with the given system, wherein system topology information comprises information pertaining to connections between microservices associated with the given system; and automatically initiating execution of the multiple determined remediation actions in accordance with the generated sequential ordering; wherein the method is carried out by at least one computing device. 2. The computer-implemented method of claim 1 , wherein the multiple resources comprise one or more of infrastructure resources, component microservices of at least one application, and one or more composite applications. 3. The computer-implemented method of claim 1 , further comprising: continuously monitoring the multiple resources and recording changes to state information due to one or more faults. 4. The computer-implemented method of claim 1 , wherein the multiple independent controllers comprise multiple operators in a Kubernetes system. 5. The computer-implemented method of claim 1 , wherein processing information pertaining to at least one fault comprises matching one or more system policies to at least a portion of the information pertaining to the at least one fault. 6. The computer-implemented method of claim 1 , wherein determining the one or more desired resource states comprises recording the one or more determined desired resource states using at least one fault localization module. 7. The computer-implemented method of claim 1 , wherein the sequential ordering of the multiple determined remediation actions comprises two or more concurrent remediation actions. 8. The computer-implemented method of claim 1 , wherein software implementing the method is provided as a service in a cloud environment. 9. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computing device to cause the computing device to: implement, using at least a portion of multiple independent controllers, at least one registry comprising one or more policy rules associated with one or more actions to be carried out in connection with occurrence of one or more specified events; process information pertaining to at least one fault impacting multiple resources within a given system, wherein respective portions of the multiple resources are managed by the multiple independent controllers, wherein processing information pertaining to at least one fault comprises invoking, upon detecting occurrence of at least one of the one or more specified events, one or more callback functions of at least a portion of the multiple independent controllers, and wherein invoking the one or more callback functions comprise invoking at least one representational state transfer application programming interface endpoint associated with the one or more callback functions over at least one hypertext transfer protocol connection; determine, by each of at least a portion of the multiple independent controllers and based at least in part on the processing of the information, one or more desired resource states and multiple remediation actions; generate, based at least in part on one or more of the determined desired resource states and the multiple determined remediation actions, a sequential ordering of the multiple determined remediation actions to be carried out by the at least a portion of the multiple controllers, wherein generating the sequential ordering of the multiple determined remediation actions is based at least in part on system topology information associated with the given system, stack information associated with the given system, and system configuration information associated with the given system, wherein system topology information comprises information pertaining to connections between microservices associated with the given system; and automatically initiate execution of the multiple determined remediation actions in accordance with the generated sequential ordering. 10. The computer program product of claim 9 , wherein the multiple resources comprise one or more of infrastructure resources, component microservices of at least one application, and one or more composite applications. 11. The computer program product of claim 9 , wherein the program instructions executable by a computing device further cause the computing device to: continuously monitor the multiple resources and recording changes to state information due to one or more faults. 12. The computer program product of claim 9 , wherein the multiple independent controllers comprise multiple operators in a Kubernetes system. 13. The computer program product of claim 9 , wherein processing information pertaining to at least one fault comprises matching one or more system policies to at least a portion of the information pertaining to the at least one fault. 14. The computer program product of claim 9 , wherein determining the one or more desired resource states comprises recording the one or more determined desired resource states using at least one fault localization module. 15. The computer program product of claim 9 , wherein the sequential ordering of the multiple determined remediation actions comprises two or more concurrent remediation actions. 16. A system comprising: a memory configured to store program instructions; and a processor operatively coupled to the memory to execute the program instructions to: implement, using at least a portion of multiple independent controllers, at least one registry comprising one or more policy rules associated with one or more actions to be carried out in connection with occurrence of one or more specified events; process information pertaining to at least one fault impacting multiple resources within a given system, wherein respective port
Error or fault detection not based on redundancy (power supply failures G06F1/30; network fault management H04L41/06) · CPC title
the processing taking place on a specific hardware platform or in a specific software environment · CPC title
Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs (verification or detection of system hardware configuration G06F11/2247) · CPC title
Remedial or corrective actions (recovery from an exception in an instruction pipeline G06F9/3861; by retry G06F11/1402; for recovering from a failure of a protocol instance or entity H04L69/40) · CPC title
where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems (multiprogramming arrangements G06F9/46; allocation of resources G06F9/50) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.