Statistics-based anomaly detection

US9628499B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9628499-B1
Application numberUS-201213569688-A
CountryUS
Kind codeB1
Filing dateAug 8, 2012
Priority dateAug 8, 2012
Publication dateApr 18, 2017
Grant dateApr 18, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are described herein for detecting an anomaly in a discrete signal, where samples in the signal correspond to amounts of data flow in a network within a time interval. The discrete signal is received, and a sequence of likelihoods corresponding to the sample values in the signal is generated. The likelihoods are based at least in part on a historical probability distribution of previously received sample values, and a likelihood is a probability of occurrence of a corresponding sample value in the signal. Likelihood change points are identified in the likelihood sequence, and the discrete signal is segmented into a plurality of segments at samples corresponding to the identified change points. A segment is identified as an anomaly based on a comparison between a statistic of the segment and a statistic of the historical probability distribution.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for identifying an anomaly in a signal, comprising: receiving a discrete signal, having sample values corresponding to amounts of data flow in a network within a time interval; generating a sequence of likelihoods corresponding to sample values in the signal and based at least in part on a historical probability distribution of previously received sample values corresponding to amounts of data flow in the network, wherein a likelihood is a probability of occurrence of a corresponding sample value in the signal; identifying likelihood change points in the likelihood sequence by: selecting a parameter L corresponding to a minimum number of samples in a segment; appending L consecutive likelihoods to a buffer; computing a sequence of first sum values of the likelihoods in the buffer; obtaining a sequence of second sum values; determining the presence of a change point in the buffer based at least in part on a comparison between the first and second sum values, wherein a plurality of likelihoods preceding the change point have a first statistic value and a plurality of likelihoods following the change point have a second statistic value different from the first statistic value; and identifying a likelihood in the buffer as a change point based at least in part on the comparison; segmenting the discrete signal into a plurality of segments at samples corresponding to the identified change points such that a respective one of the samples corresponding to the identified change points is at one of a beginning or an end of each of the plurality of segments; identifying a segment as an anomaly based on a comparison between a statistic of the segment and a statistic of the historical probability distribution; and reconfiguring, responsive to identifying the segment as the anomaly, a component of the network. 2. The method of claim 1 , wherein an anomaly is indicative of a deviation in the data flow from standard network operation. 3. The method of claim 1 , wherein the historical probability distribution represents amounts of data flow during standard network operation. 4. The method of claim 1 , wherein: the first and second statistic values are mean values of the corresponding likelihoods; the sequence of first sum values is based on a cumulative sum sequence of the likelihoods in the buffer; and the sequence of second sum values is based on a cumulative sum sequence of randomly reordered likelihoods in the buffer. 5. The method of claim 4 , wherein: a cumulative sum sequence is computed based on a sequence of differences between the likelihoods and a mean of the likelihoods in the buffer; a change point is determined to be in the buffer when a maximal absolute first sum value exceeds a maximal absolute second sum value; and the change point corresponds to the maximal absolute first sum value. 6. The method of claim 1 , wherein: the first and second statistic values are median values of the corresponding likelihoods; the sequence of first sum values is based on a rank sum sequence of the likelihoods in the buffer; the sequence of second sum values is based on a rank sum sequence of a linear function; and the change point is identified as the likelihood corresponding to a first sum value substantially equal to a corresponding second sum value. 7. The method of claim 1 , comprising: removing samples preceding the identified change point from the buffer; and appending another L likelihoods to the buffer. 8. The method of claim 1 , wherein the statistic of the segment and the statistic of the historical probability distribution are medians of the corresponding sample values. 9. The method of claim 1 , wherein the component is a network buffer. 10. An apparatus for identifying an anomaly in a signal, comprising a processor and a memory unit storing computer executable instructions that when executed by the processor cause the processor to: receive a discrete signal, having sample values corresponding to amounts of data flow in a network within a time interval; generate a sequence of likelihoods corresponding to sample values in the signal and based at least in part on a historical probability distribution of previously received sample values corresponding to amounts of data flow in the network, wherein a likelihood is a probability of occurrence of a corresponding sample value in the signal; identify likelihood change points in the likelihood sequence, by: selecting a parameter L corresponding to a minimum number of samples in a segment; appending L consecutive likelihoods to a buffer; computing a sequence of first sum values of the likelihoods in the buffer; obtaining a sequence of second sum values; determining the presence of a change point in the buffer based at least in part on a comparison between the first and second sum values, wherein a plurality of likelihoods preceding the change point have a first statistic value and a plurality of likelihoods following the change point have a second statistic value different from the first statistic value; and identifying a likelihood in the buffer as a change point based at least in part on the comparison; segment the discrete signal into a plurality of segments at samples corresponding to the identified change points such that a respective one of the samples corresponding to the identified change points is at one of a beginning or an end of each of the plurality of segments; identify a segment as an anomaly based on a comparison between a statistic of the segment and a statistic of the historical probability distribution; and reconfigure, responsive to identifying the segment as the anomaly, a component of the network. 11. The apparatus of claim 10 , wherein an anomaly is indicative of a deviation in the data flow from standard network operation. 12. The apparatus of claim 10 , wherein the historical probability distribution represents amounts of data flow during standard network operation. 13. The apparatus of claim 10 , wherein: the first and second statistic values are mean values of the corresponding likelihoods; the sequence of first sum values is based on a cumulative sum sequence of the likelihoods in the buffer; and the sequence of second sum values is based on a cumulative sum sequence of randomly reordered likelihoods in the buffer. 14. The apparatus of claim 13 , wherein: a cumulative sum sequence is computed based on a sequence of differences between the likelihoods and a mean of the likelihoods in the buffer; a change point is determined to be in the buffer when a maximal absolute first sum value exceeds a maximal absolute second sum value; and the change point corresponds to the maximal absolute first sum value. 15. The apparatus of claim 10 , wherein: the first and second statistic values are median values of the corresponding likelihoods; the sequence of first sum values is based on a rank sum sequence of the likelihoods in the buffer; the sequence of second sum values is based on a rank sum sequence of a linear function; and the change point is identified as the likelihood corresponding to a first sum value substantially equal to a corresponding second sum value. 16. The apparatus of claim 10 , the instructions causing the processor to: remove samples preceding the identified change point from the buffer; and append another L likelihoods to the buffer. 17. The apparatus of claim 10 , wherein the statistic of the segment and the statistic of the historical probability distribution are medians of the corresponding samp

Assignees

Inventors

Classifications

  • Traffic logging, e.g. anomaly detection · CPC title

  • by observing the pattern of computer usage, e.g. typical user behaviour · CPC title

  • Event detection, e.g. attack signature detection · CPC title

  • involving long-term monitoring or reporting · CPC title

  • Localisation of faults · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9628499B1 cover?
Systems and methods are described herein for detecting an anomaly in a discrete signal, where samples in the signal correspond to amounts of data flow in a network within a time interval. The discrete signal is received, and a sequence of likelihoods corresponding to the sample values in the signal is generated. The likelihoods are based at least in part on a historical probability distribution…
Who is the assignee on this patent?
Yu Kevin, Zhang Xinyi, Google Inc
What technology area does this patent fall under?
Primary CPC classification H04L63/1425. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Apr 18 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).