Coordinating fault recovery in a distributed system
US-9218246-B2 · Dec 22, 2015 · US
US9436544B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9436544-B2 |
| Application number | US-201414462728-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 19, 2014 |
| Priority date | Jun 26, 2014 |
| Publication date | Sep 6, 2016 |
| Grant date | Sep 6, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method, device and non-transitory computer readable medium that implements error detection and recovery includes receiving from one or more agents monitoring one or more subsystem processes of a business process operating in a cloud based architecture an identification of an error condition in at least one of the subsystem processes. Additionally, any associated information or data necessary to execute the at least one of the subsystem processes with the identified error condition is received. An error recovery process for the at least one of the subsystem processes with the identified error condition is executed by the application management computing device. The recovered at least one of the subsystem processes with the identified error condition is reinitiated using the received information or data corresponding to the at least one of the subsystem processes with the identified error condition.
Opening claim text (preview).
What is claimed is: 1. A method for implementing error detection and recovery, the method comprising: receiving, by an application management computing device, from one or more agents monitoring one or more subsystem processes of a business process operating in a cloud based architecture, i) an identification of an error condition in at least one of the one or more subsystem processes, and ii) any associated information or data necessary to execute the at least one of the one or more subsystem processes with the identified error condition; executing, by the application management computing device, an error recovery process for the at least one of the one or more subsystem processes with the identified error condition; logging, by the application management computing device, details on performance of the error recovery process in a memory; automatically modifying, by the application management computing device, a number of error recovery processes for the at least one of the one or more subsystem processes with the identified error condition based on performance of the error recovery process and details associated with the identified error condition; and reinitiating, by the application management computing device, the recovered at least one of the one or more subsystem processes with the identified error condition using the received information or data corresponding to the at least one of the subsystem processes with the identified error condition. 2. The method of claim 1 further comprising receiving, by the application management computing device, from the one or more agents, one or more metrics of the at least one of the one or more subsystem processes. 3. The method of claim 2 wherein the one or more metrics further comprise at least one of a status of one or more processing activities, a measurement of utilization of one or more resources, or a measurement of performance of one or more aspects. 4. The method of claim 2 further comprising storing, by the application management computing device, the one or more metrics. 5. The method of claim 2 further comprising generating, by the application management computing device, at least one report based on at least one of the identified error condition or the one or more metrics. 6. The method of claim 1 further comprising allowing, by the application management computing device, one of the one or more subsystem processes to continue execution during the executing a corresponding error recovery process and the reinitiating the recovered at least one of the one or more subsystem processes with the identified error condition. 7. An application management computing device, comprising: a processor coupled to a memory and configured to be capable of executing programmed instructions stored in the memory, comprising: receiving from one or more agents monitoring one or more subsystem processes of a business process operating in a cloud based architecture, i) an identification of an error condition in at least one of the one or more subsystem processes, and ii) any associated information or data necessary to execute the at least one of the one or more subsystem processes with the identified error condition; executing an error recovery process for the at least one of one or more the subsystem processes with the identified error condition; logging details on performance of the error recovery process in a memory; automatically modifying a number of error recovery processes for the at least one of the one or more subsystem processes with the identified error condition based on performance of the error recovery process and details associated with the identified error condition; and reinitiating the recovered at least one of the one or more subsystem processes with the identified error condition using the received information or data corresponding to the at least one of the subsystem processes with the identified error condition. 8. The device of claim 7 further comprising receiving from the one or more agents, one or more metrics of the at least one of the one or more subsystem processes. 9. The device of claim 8 wherein the one or more metrics further comprise at least one of a status of one or more processing activities, a measurement of utilization of one or more resources, or a measurement of performance of one or more aspects. 10. The device of claim 8 further comprising storing the one or more metrics. 11. The device of claim 8 further comprising generating at least one report based on at least one of the identified error condition or the one or more metrics. 12. The device of claim 7 further comprising allowing one of the one or more subsystem processes to continue execution during the executing a corresponding error recovery process and the reinitiating the recovered at least one of the one or more subsystem processes with the identified error condition. 13. A non-transitory computer readable medium having stored thereon instructions for implementing error detection and recovery comprising machine executable code which when executed by a processor, causes the processor to perform steps comprising: receiving from one or more agents monitoring one or more subsystem processes of a business process operating in a cloud based architecture, i) an identification of an error condition in at least one of the one or more subsystem processes, and ii any associated information or data necessary to execute the at least one of the one or more subsystem processes with the identified error condition; executing an error recovery process for the at least one of one or more the subsystem processes with the identified error condition; logging details on performance of the error recovery process in a memory; automatically modifying a number of error recovery processes for the at least one of the one or more subsystem processes with the identified error condition based on performance of the error recovery process and details associated with the identified error condition; and reinitiating the recovered at least one of the one or more subsystem processes with the identified error condition using the received information or data corresponding to the at least one of the subsystem processes with the identified error condition. 14. The medium of claim 13 further comprising receiving from the one or more agents, one or more metrics of the at least one of the one or more subsystem processes. 15. The medium of claim 14 wherein the one or more metrics further comprise at least one of a status of one or more processing activities, a measurement of utilization of one or more resources, or a measurement of performance of one or more aspects. 16. The medium of claim 14 further comprising storing the one or more metrics. 17. The medium of claim 14 further comprising generating at least one report based on at least one of the identified error condition or the one or more metrics. 18. The medium of claim 13 further comprising allowing one of the one or more processes to continue execution during the executing a corresponding error recovery process and the reinitiating the recovered at least one of the one or more subsystem processes with the identified error condition.
Remedial or corrective actions (recovery from an exception in an instruction pipeline G06F9/3861; by retry G06F11/1402; for recovering from a failure of a protocol instance or entity H04L69/40) · CPC title
in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems · CPC title
Administration; Management · CPC title
Performance analysis of employees; Performance analysis of enterprise or organisation operations · CPC title
Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.