Unsupervised behavior learning system and method for predicting performance anomalies in distributed computing infrastructures

US10311356B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10311356-B2
Application numberUS-201414480270-A
CountryUS
Kind codeB2
Filing dateSep 8, 2014
Priority dateSep 9, 2013
Publication dateJun 4, 2019
Grant dateJun 4, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An unsupervised behavior learning system and method for predicting anomalies in a distributed computing infrastructure. The distributed computing infrastructure includes a plurality of computer machines. The system includes a first computer machine and a second computer machine. The second computer machine is configured to generate a model of normal and anomalous behavior of the first computer machine, where the model is based on unlabeled training data. The second computer machine is also configured to acquire real-time data of system level metrics of the first machine; determine whether the real-time data is normal or anomalous based on a comparison of the real-time data to the model; and predict a future failure of the first computer machine based on multiple consecutive comparisons of the real-time data to the model. Upon predicting a future failure of the first computer machine, generate a ranked set of system-level metrics which are contributors to the predicted failure of the first computer machine, and generate an alarm that includes the ranked set of system-level metrics. The model of normal and anomalous behavior may include a self-organizing map.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of predicting performance anomalies in distributed computing infrastructure with a computer machine using unsupervised multi-dimensional behavior learning, the method comprising: aggregating and normalizing, with the computer machine, real-time data of a plurality of metrics for the distributed computing infrastructure; generating, with the computer machine, an artificial neural network model of normal and anomalous behavior for the distributed computing infrastructure based on unlabeled training data from a plurality of nodes associated with the distributed computing infrastructure, wherein the model includes a plurality of neurons; determining, with the computer machine, which neuron of the plurality of neurons an input real-time data should be mapped to and whether the input real-time data is normal or abnormal based on whether the mapped neuron represents a normal or abnormal system state; predicting, with the computer machine, a future system failure of the distributed computing infrastructure based on an evolution path in the artificial neural network model consisting of multiple consecutive comparisons of the real-time data to the artificial neural network model; upon predicting the future system failure, the computer machine generating a ranked list of metrics which are top contributors to the predicted future system failure of the distributed computing infrastructure, wherein the ranked list is generated based on differences between normal metric values and anomalous metric values selecting at least one preventative action based on the ranked list of metrics; and initiating the at least one preventative action. 2. The method of claim 1 , wherein generating the artificial neural network model of normal and anomalous behavior includes generating a self-organizing map (“SOM”). 3. The method of claim 2 , wherein generating the SOM includes calculating a neighborhood area size for a first neuron in the SOM. 4. The method of claim 3 , wherein calculating the neighborhood area size for a neuron in the SOM includes calculating a distance between the first neuron and a second neuron in the SOM. 5. The method of claim 3 , wherein generating the SOM further includes setting a threshold value based at least in part on the neighborhood area size. 6. The method of claim 2 , wherein determining whether the real-time data is normal or anomalous includes receiving a real-time input measurement vector, mapping the real-time input measurement vector to the neuron with the smallest distance with the input measurement vector in the SOM, comparing the neighborhood area size value of the mapped neuron to a threshold value, classifying the real-time input measurement vector as normal if the neighborhood area size value of the mapped neuron is less than the threshold value, and classifying the real-time input measurement vector as anomalous if the neighborhood area size value of the mapped neuron is greater than or equal to the threshold value. 7. The method of claim 2 , wherein predicting the future system failure of the distributed computing infrastructure includes determining if a number of consecutive real-time input measurement vectors are mapped to anomalous neurons. 8. The method of claim 2 , wherein generating the ranked list of metrics includes determining a set of normal neurons that are nearby to an anomalous neuron which a last real-time input measurement vector was mapped to. 9. The method of claim 1 , wherein generating the artificial neural network model of normal and anomalous behavior includes updating a self-organizing map (“SOM”). 10. The method of claim 9 , wherein updating the SOM includes receiving an input measurement vector, calculating a distance of the input measurement vector to a weight vector of each neuron in the SOM, selecting a neuron in the SOM with the smallest distance to the input measurement vector, updating the weight vector of the selected neuron, and updating weight vectors of neurons in a neighborhood of the selected neuron. 11. The method of claim 1 , wherein the at least one preventative action includes at least one selected from a group consisting of rebooting a virtual machine, restarting a virtual machine, migrating a virtual machine to another host, cloning a virtual machine, scaling a resource of a virtual machine, and disk cleanup. 12. An unsupervised behavior learning system for predicting anomalies in a distributed computing infrastructure, the distributed computing infrastructure including a plurality of computer machines, the system comprising: a computer machine configured to aggregate and normalize real-time data of a plurality of metrics for the distributed computing infrastructure from a plurality of nodes associated with the distributed computing infrastructure; generate an artificial neural network model of normal and anomalous behavior for the distributed computing infrastructure based on unlabeled training data, wherein the model includes a plurality of neurons; determine which neuron of the plurality of neurons an input real-time data should be mapped to and whether the input real-time data is normal or abnormal based on whether the mapped neuron represents a normal or abnormal system state; predict a future system failure of the distributed computing infrastructure based on an evolution path in the artificial neural network model consisting of multiple consecutive comparisons of the real-time data to the artificial neural network model; upon predicting the future system failure of distributed computing infrastructure, generate a ranked list of metrics which are top contributors to the predicted future system failure of the distributed computing infrastructure, wherein the ranked list is generated based on differences between normal metric values and anomalous metric values select at least one preventive action based on the ranked list of metrics; and initiate the at least one preventative action. 13. The system of claim 12 , wherein the artificial neural network model of normal and anomalous behavior includes a self-organizing map (“SOM”). 14. The system of claim 13 , wherein the computer machine generates the SOM includes calculating a neighborhood area size for a first neuron in the SOM. 15. The system of claim 14 , wherein calculating the neighborhood area size for a neuron includes calculating a distance between the first neuron and a second neuron in the SOM. 16. The system of claim 14 , wherein the computer machine generates the SOM further includes by setting a threshold value based at least in part on the neighborhood area size. 17. The system of claim 13 , wherein the computer machine determines whether the real-time data is normal or anomalous includes receiving a real-time input measurement vector, mapping the real-time input measurement vector to a first neuron in the SOM, comparing a neighborhood area size value of the first neuron to a threshold value, classifying the real-time input measurement vector as normal if the neighborhood area size value of the first neuron is less than the threshold value, and classifying the real-time input measurement vector as anomalous if the neighborhood area size value of the first neuron is greater than or equal to the threshold value. 18. The system of claim 13 , wherein the computer machine predicts the future system failure of the distributed computing infrastructure includes determining if a number of consecutive real-time input measurement vectors are mapped to anomalous neurons.

Assignees

Inventors

Classifications

  • G06N3/088Primary

    Non-supervised learning, e.g. competitive learning · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

  • Feedforward networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10311356B2 cover?
An unsupervised behavior learning system and method for predicting anomalies in a distributed computing infrastructure. The distributed computing infrastructure includes a plurality of computer machines. The system includes a first computer machine and a second computer machine. The second computer machine is configured to generate a model of normal and anomalous behavior of the first computer …
Who is the assignee on this patent?
Univ North Carolina State
What technology area does this patent fall under?
Primary CPC classification G06N3/088. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 04 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).