Automatic anomaly detection and resolution system

US10042697B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10042697-B2
Application numberUS-201615165298-A
CountryUS
Kind codeB2
Filing dateMay 26, 2016
Priority dateMay 28, 2015
Publication dateAug 7, 2018
Grant dateAug 7, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An anomaly detection and resolution system (ADRS) is disclosed for automatically detecting and resolving anomalies in computing environments. The ADRS may be implemented using an anomaly classification system defining different types of anomalies (e.g., a defined anomaly and an undefined anomaly). A defined anomaly may be based on bounds (fixed or seasonal) on any metric to be monitored. An anomaly detection and resolution component (ADRC) may be implemented in each component defining a service in a computing system. An ADRC may be configured to detect and attempt to resolve an anomaly locally. If the anomaly event for an anomaly can be resolved in the component, the ADRC may communicate the anomaly event to an ADRC of a parent component, if one exists. Each ADRC in a component may be configured to locally handle specific types of anomalies to reduce communication time and resource usage for resolving anomalies.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: providing, by a computer system, a service using a plurality of components, the plurality of components including a first component and a second component, wherein the first component is a child of the second component, and wherein a first resource pool is provided to the first component from a second resource pool that is available to the second component; providing, by the computer system, a first anomaly detection and resolution component (ADRC) for detecting one or more anomaly events for the first component; providing, by the computer system, a second ADRC for detecting one or more anomaly events for the second component; monitoring, by the first ADRC, a first metric related to the provided service; based on monitoring the first metric, detecting, by the first ADRC, a first anomaly event based on determining that a first set of one or more measures has been satisfied, the first set of one or more measures defining the first anomaly event, the first set of one or more measures including a first limit and a second limit, and wherein determining that the first set of one or more measures has been satisfied comprises determining that a value of the first metric is outside a range defined by the first limit and the second limit; and responsive to detecting the first anomaly event: identifying, by the first ADRC, a first policy for handling the first anomaly event, the first policy comprising a first set of one or more rules; identifying, by the first ADRC, that a first rule from the first set of one or more rules in the first policy is satisfied by the first anomaly event; determining, by the first ADRC, a first corrective action based on the first rule; and initiating, by the first ADRC, the first corrective action. 2. The method of claim 1 , wherein the first set of one or more measures further comprises a third limit and a polling interval value, wherein the third limit is higher than the second limit, wherein the second limit is higher than the first limit. 3. The method of claim 1 , wherein detecting the first anomaly event comprises determining a number of occurrences of when the value of the first metric is outside the range. 4. The method of claim 3 , wherein detecting the first anomaly event further comprises determining that the number of occurrences is greater than a minimum threshold. 5. The method of claim 1 , wherein detecting the first anomaly event comprises determining a number of occurrences within a time period of when the value of the first metric is outside the range. 6. The method of claim 5 , wherein detecting the first anomaly event further comprises determining that the number of occurrences is greater than a minimum threshold. 7. The method of claim 1 , wherein detecting the first anomaly event comprises determining that a time associated with the first anomaly event is within a range defined by a seasonal start time and a seasonal end time, and wherein the seasonal start time and the seasonal end time define a seasonal time period for when the first set of one or more measures are valid. 8. The method of claim 1 , wherein the first set of one or more measures is determined based on analyzing time series data of log files associated with providing the service. 9. The method of claim 1 , further comprising: monitoring, by the first ADRC, a second metric related to the provided service; based on monitoring the second metric, detecting, by the first ADRC, a second anomaly event based on determining that a second set of one or more measures has been satisfied by a value of the second metric, the second set of one or more measures defining the second anomaly event; and responsive to detecting the second anomaly event: determining, by the first ADRC, that the first ADRC cannot identify a policy for resolving the second anomaly event; notifying the second component, by the first ADRC, that the second anomaly event cannot be resolved by the first component; identifying, by the second ADRC, a second policy for handling the second anomaly event, the second policy comprising a second set of one or more rules; identifying, by the second ADRC, that a second rule from the second set of one or more rules in the second policy is satisfied by the second anomaly event; determining, by the second ADRC, a second corrective action based on the second rule; and initiating, by the second ADRC, the second corrective action. 10. The method of claim 1 , wherein the first metric related to the provided service is one of a plurality of metrics monitored for quality of service (QoS) for providing the service. 11. A system comprising: a processor; and a memory accessible by the processor, the memory storing instructions which, upon execution by the processor, cause the processor to perform processing comprising: providing a service using a plurality of components, the plurality of components including a first component and a second component, wherein the first component is a child of the second component, and wherein a first resource pool is provided to the first component from a second resource pool that is available to the second component; providing a first anomaly detection and resolution component (ADRC) for detecting one or more anomaly events for the first component; providing a second ADRC for detecting one or more anomaly events for the second component; monitoring, by the first ADRC, a metric related to the provided service; based on the monitoring, detecting, by the first ADRC, a first anomaly event based on determining that a first set of one or more measures has been satisfied, the first set of one or more measures defining the first anomaly event, the first set of one or more measures including a first limit and a second limit, and wherein determining that the first set of one or more measures has been satisfied comprises determining that a value of the metric is outside a range defined by the first limit and the second limit; and responsive to detecting the first anomaly event: identifying, by the first ADRC, a first policy for handling the first anomaly event, the first policy comprising a first set of one or more rules; identifying, by the first ADRC, that a first rule from the first set of one or more rules in the first policy is satisfied by the first anomaly event; determining, by the first ADRC, a first corrective action based on the first rule; and initiating, by the first ADRC, the first corrective action. 12. The system of claim 11 , wherein the first set of one or more measures further comprises a third limit and a polling interval value, wherein the third limit is higher than the second limit, wherein the second limit is higher than the first limit. 13. The system of claim 11 , wherein detecting the first anomaly event comprises determining a number of occurrences of when the value of the metric is outside the range. 14. The system of claim 13 , wherein detecting the first anomaly event comprises determining that the number of occurrences exceeds a minimum threshold. 15. The system of claim 11 , wherein detecting the first anomaly event comprises determining a number of occurrences within a time period of when the value of the first metric is outside the range. 16. The system of claim 11 , wherein detecting the first anomaly event comprises determining that a time associated with the first anomaly event is within a range defined by a seasonal start time and a seasonal end time, wherein the seasonal start time and the seasonal end time define a seasonal time period for when the first set of one o

Assignees

Inventors

Classifications

  • in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems · CPC title

  • Remedial or corrective actions (recovery from an exception in an instruction pipeline G06F9/3861; by retry G06F11/1402; for recovering from a failure of a protocol instance or entity H04L69/40) · CPC title

  • Performance evaluation by statistical analysis · CPC title

  • by assessing time · CPC title

  • Error or fault detection not based on redundancy (power supply failures G06F1/30; network fault management H04L41/06) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10042697B2 cover?
An anomaly detection and resolution system (ADRS) is disclosed for automatically detecting and resolving anomalies in computing environments. The ADRS may be implemented using an anomaly classification system defining different types of anomalies (e.g., a defined anomaly and an undefined anomaly). A defined anomaly may be based on bounds (fixed or seasonal) on any metric to be monitored. An ano…
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification G06F11/0793. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 07 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).