Anomaly detection in time-series data using state inference and machine learning

US11361197B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11361197-B2
Application numberUS-201816023110-A
CountryUS
Kind codeB2
Filing dateJun 29, 2018
Priority dateJun 29, 2018
Publication dateJun 14, 2022
Grant dateJun 14, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are provided for anomaly detection in time-series data using state inference and machine learning. An exemplary method comprises: obtaining detected states of a plurality of data samples in temporal data, wherein each data sample in the temporal data has a corresponding detected state; obtaining a likelihood that each of the data samples belongs to the corresponding detected state; obtaining a distribution of likelihoods of the data samples indicating a number of observations of each of a plurality of likelihood values; training, using a supervised learning technique, an anomaly detection model that, given the distribution of likelihoods and one or more anomaly thresholds, generates a quality score for each of the anomaly thresholds; and selecting at least one anomaly threshold based on the quality score, wherein the trained anomaly detection model is applied to detect anomalies in new temporal data samples using the selected at least one anomaly threshold.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: obtaining one or more detected states of a plurality of data samples in temporal data, wherein each of the data samples in said temporal data has a corresponding detected state; obtaining a likelihood that each of said data samples belongs to said corresponding detected state; obtaining a distribution of the likelihoods of the data samples indicating a number of observations of each of a plurality of likelihood values; training, by at least one processing device, using a supervised learning technique, and using a training dataset comprising the distribution of the likelihoods, a plurality of anomaly thresholds, and a corresponding quality score for each anomaly threshold of the plurality of anomaly thresholds indicating a performance of detecting anomalies in the plurality of data samples using the corresponding anomaly threshold, a first machine learning model, wherein training the first machine learning model comprises adjusting one or more parameters of the first machine learning model to predict the corresponding quality score for each combination of: (i) the distribution of the likelihoods and (ii) each of the anomaly thresholds; selecting, by the at least one processing device and using the first machine learning model, at least one anomaly threshold to apply to new temporal data samples; and configuring a second machine learning model for anomaly detection using at least the selected at least one anomaly threshold, wherein the second machine learning model for anomaly detection is applied to detect anomalies in the new temporal data samples. 2. The method of claim 1 , wherein the distribution of the likelihoods is an aggregation of the likelihoods that each data sample belongs to said corresponding detected state. 3. The method of claim 1 , further comprising the step of clustering the data samples from the temporal data into a plurality of clusters using temporal information, wherein each of the plurality of clusters corresponds to one detected state. 4. The method of claim 3 , wherein the likelihood that each of said data samples belongs to said corresponding detected state is obtained from a probability distribution provided by the clustering step. 5. The method of claim 1 , wherein the likelihood comprises a log likelihood. 6. The method of claim 1 , further comprising a testing phase that evaluates a performance of the second machine learning model for anomaly detection on new labeled temporal data samples. 7. The method of claim 1 , wherein the second machine learning model for anomaly detection detects anomalies in the new temporal data samples by comparing one or more of the new temporal data samples to the selected at least one anomaly threshold. 8. A computer program product, comprising a non-transitory processor-readable storage medium having encoded therein executable code of one or more software programs, wherein the one or more software programs when executed by at least one processing device perform the following steps: obtaining one or more detected states of a plurality of data samples in temporal data, wherein each of the data samples in said temporal data has a corresponding detected state; obtaining a likelihood that each of said data samples belongs to said corresponding detected state; obtaining a distribution of the likelihoods of the data samples indicating a number of observations of each of a plurality of likelihood values; training, by at least one processing device, using a supervised learning technique, and using a training dataset comprising the distribution of the likelihoods, a plurality of anomaly thresholds, and a corresponding quality score for each anomaly threshold of the plurality of anomaly thresholds indicating a performance of detecting anomalies in the plurality of data samples using the corresponding anomaly threshold, a first machine learning model, wherein training the first machine learning model comprises adjusting one or more parameters of the first machine learning model to predict the corresponding quality score for each combination of: (i) the distribution of the likelihoods and (ii) each of the anomaly thresholds; selecting, by the at least one processing device and using the first machine learning model, at least one anomaly threshold to apply to new temporal data samples; and configuring a second machine learning model for anomaly detection using at least the selected at least one anomaly threshold, wherein the second machine learning model for anomaly detection is applied to detect anomalies in the new temporal data samples. 9. The computer program product of claim 8 , wherein the distribution of the likelihoods is an aggregation of the likelihoods that each data sample belongs to said corresponding detected state. 10. The computer program product of claim 8 , further comprising the step of clustering the data samples from the temporal data into a plurality of clusters using temporal information, wherein each of the plurality of clusters corresponds to one detected state. 11. The computer program product of claim 8 , wherein the likelihood comprises a log likelihood. 12. The computer program product of claim 8 , further comprising a testing phase that evaluates a performance of the second machine learning model for anomaly detection on new labeled temporal data samples. 13. The computer program product of claim 8 , wherein the second machine learning model for anomaly detection detects anomalies in the new temporal data samples by comparing one or more of the new temporal data samples to the selected at least one anomaly threshold. 14. An apparatus, comprising: a memory; and at least one processing device, coupled to the memory, operative to implement the following steps: obtaining one or more detected states of a plurality of data samples in temporal data, wherein each of the data samples in said temporal data has a corresponding detected state; obtaining a likelihood that each of said data samples belongs to said corresponding detected state; obtaining a distribution of the likelihoods of the data samples indicating a number of observations of each of a plurality of likelihood values; training, by at least one processing device, using a supervised learning technique, and using a training dataset comprising the distribution of the likelihoods, a plurality of anomaly thresholds, and a corresponding quality score for each anomaly threshold of the plurality of anomaly thresholds indicating a performance of detecting anomalies in the plurality of data samples using the corresponding anomaly threshold, a first machine learning model, wherein training the first machine learning model comprises adjusting one or more parameters of the first machine learning model to predict the corresponding quality score for each combination of: (i) the distribution of the likelihoods and (ii) each of the anomaly thresholds; selecting, by the at least one processing device and using the first machine learning model, at least one anomaly threshold to apply to new temporal data samples; and configuring a second machine learning model for anomaly detection using at least the selected at least one anomaly threshold, wherein the second machine learning model for anomaly detection is applied to detect anomalies in the new temporal data samples. 15. The apparatus of claim 14 , wherein the distribution of the likelihoods is an aggregation of the likelihoods that each data sample belongs to said corresponding detected state. 16. The apparatus of claim 14 , further comprising the step of clustering the data sampl

Assignees

Inventors

Classifications

  • G06N20/10Primary

    using kernel methods, e.g. support vector machines [SVM] · CPC title

  • Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection · CPC title

  • based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Clustering techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11361197B2 cover?
Techniques are provided for anomaly detection in time-series data using state inference and machine learning. An exemplary method comprises: obtaining detected states of a plurality of data samples in temporal data, wherein each data sample in the temporal data has a corresponding detected state; obtaining a likelihood that each of the data samples belongs to the corresponding detected state; o…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06N20/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 14 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).