Multi-phase cloud service node error prediction
US-2021208983-A1 · Jul 8, 2021 · US
US11599435B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11599435-B2 |
| Application number | US-201916540080-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 14, 2019 |
| Priority date | Jun 26, 2019 |
| Publication date | Mar 7, 2023 |
| Grant date | Mar 7, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A failure analysis system identifies a root cause of a failure (or other health issue) in a virtualized computing environment and provides a recommendation for remediation. The failure analysis system uses a model-based reasoning (MBR) approach that involves building a model describing the relationships/dependencies of elements in the various layers of the virtualized computing environment, and the model is used by an inference engine to generate facts and rules for reasoning to identify an element in the virtualized computing environment that is causing the failure. Then, then the failure analysis system uses a decision tree analysis (DTA) approach to perform a deep diagnosis of the element, by traversing a decision tree that was generated by combining the rules for reasoning provided by the MBR approach, in conjunction with examining data collected by health monitors. The result of the DTA approach is then used to generate the recommendation for remediation.
Opening claim text (preview).
We claim: 1. A method to address health issues indicative of operating conditions in a virtualized computing environment that includes at least one host, the method comprising: monitoring, by a health check agent installed at the at least one host, health of the virtualized computing environment; detecting, based on health check information provided by the health check agent from the monitoring, a health issue that has manifested in the virtualized computing environment; generating, by an automated tool, a model that represents elements at multiple layers of the virtualized computing environment, and connections and relationships between the elements; using model-based reasoning to identify an element, amongst the elements, in the virtualized computing environment that is a source of the health issue, wherein the model-based reasoning uses the model, representing the connections and the relationships between the elements in the virtualized computing environment, to determine facts and rules for identification of the element that is the source of the health issue; using a decision tree analysis to identify a root cause of the health issue at the identified element, wherein a decision tree for use in the decision tree analysis is generated by injecting a fault into the virtualized computing environment to determine types, locations, and number of failures that are generated in the virtualized computing environment due to the injected fault; based on a result of the decision tree analysis that identifies the root cause of the health issue, generating a recommendation for remediation of the health issue; and performing the recommended remediation of the health issue. 2. The method of claim 1 , wherein the automated tool comprises a diagnostics tool, and wherein generating the model includes applying the diagnostics tool to the virtualized computing environment to discover and collect information about the elements in the virtualized computing environment. 3. The method of claim 1 , wherein the decision tree, for use in the decision tree analysis, is further generated by one or more of: using results of the injected fault as starting points for a machine-learning technique to evolve the decision tree; analyzing internal program logic of the elements in the virtualized computing environment; or analyzing processes that were historically used to troubleshoot health issues that were reported in the virtualized computing environment. 4. The method of claim 1 , wherein the elements in the virtualized computing environment include elements of a distributed storage system that are arranged in storage clusters, and wherein at least one of the health issues includes a cluster partition issue or other storage-operation-related issue in the distributed storage system. 5. The method of claim 1 , wherein using the decision tree analysis to identify the root cause of the health issue includes evaluating the health check information and configuration information while traversing a branch of the decision tree. 6. The method of claim 1 , wherein: the facts determined from the model are used to generate the rules, and the rules are combined to form the decision tree for the decision tree analysis. 7. The method of claim 1 , further comprising updating either or both the model and the decision tree for the decision tree analysis, in response to identifying a new root cause associated with a particular health issue, so that the updated model or the updated decision tree are usable to analyze other health issues that are similar to the particular health issue. 8. A non-transitory computer-readable medium having instructions stored thereon, which in response to execution by one or more processors, cause the one or more processors to perform or control performance of operations to address health issues indicative of operating conditions in a virtualized computing environment that includes at least one host, the operations comprising: monitoring, by a health check agent installed at the at least one host, health of the virtualized computing environment; detecting, based on health check information provided by the health check agent from the monitoring, a health issue that has manifested in the virtualized computing environment; generating, by an automated tool, a model that represents elements at multiple layers of the virtualized computing environment, and connections and relationships between the elements; using model-based reasoning to identify an element, amongst the elements, in the virtualized computing environment that is a source of the health issue, wherein the model-based reasoning uses the model, representing the connections and the relationships between the elements in the virtualized computing environment, to determine facts and rules for identification of the element that is the source of the health issue; using a decision tree analysis to identify a root cause of the health issue at the identified element, wherein a decision tree for use in the decision tree analysis is generated by injecting a fault into the virtualized computing environment to determine types, locations, and number of failures that are generated in the virtualized computing environment due to the injected fault; based on a result of the decision tree analysis that identifies the root cause of the health issue, generating a recommendation for remediation of the health issue; and performing the recommended remediation of the health issue. 9. The non-transitory computer-readable medium of claim 8 , wherein the automated tool comprises a diagnostics tool, and wherein generating the model includes applying the diagnostics tool to the virtualized computing environment to discover and collect information about the elements in the virtualized computing environment. 10. The non-transitory computer-readable medium of claim 8 , wherein the decision tree, for use in the decision tree analysis, is further generated by one or more of: using results of the injected fault as starting points for a machine-learning technique to evolve the decision tree; analyzing internal program logic of the elements in the virtualized computing environment; or analyzing processes that were historically used to troubleshoot health issues that were reported in the virtualized computing environment. 11. The non-transitory computer-readable medium of claim 8 , wherein the elements in the virtualized computing environment include elements of a distributed storage system that are arranged in storage clusters, and wherein at least one of the health issues includes a cluster partition issue or other storage-operation-related issue in the distributed storage system. 12. The non-transitory computer-readable medium of claim 8 , wherein using the decision tree analysis to identify the root cause of the health issue includes evaluating the health check information and configuration information while traversing a branch of the decision tree. 13. The non-transitory computer-readable medium of claim 8 , wherein: the facts determined from the model are used to generate the rules, and the rules are combined to form the decision tree for the decision tree analysis. 14. The non-transitory computer-readable medium of claim 8 , wherein the operations further comprise: updating either or both the model and the decision tree for the decision tree analysis, in response to identifying a new root cause associated with a particular health issue, so that the updated model or the updated decision tree are usable to analyze other health issues that are similar to the particular health issue. 15.
Trees, e.g. B+trees · CPC title
Knowledge engineering; Knowledge acquisition · CPC title
Remedial or corrective actions (recovery from an exception in an instruction pipeline G06F9/3861; by retry G06F11/1402; for recovering from a failure of a protocol instance or entity H04L69/40) · CPC title
using expert systems · CPC title
Hypervisor-specific management and integration aspects · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.