Combination of techniques to detect anomalies in multi-dimensional time series
US-2019236177-A1 · Aug 1, 2019 · US
US11449748B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11449748-B2 |
| Application number | US-201816172724-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 26, 2018 |
| Priority date | Oct 26, 2018 |
| Publication date | Sep 20, 2022 |
| Grant date | Sep 20, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for adaptive thresholding are provided. First and second data points are received. A plurality of data points are identified, where the plurality of data points corresponds to timestamps associated with the first and second data points. At least one cluster is generated for the plurality of data points based on a predefined cluster radius. Upon determining that the first data point is outside of the cluster, the first data point is labeled as anomalous. A predicted value is generated for the second data point, based on processing data points in the cluster using a machine learning model, and a deviation between the predicted value and an actual value for the second data point is computed. Upon determining that the deviation exceeds a threshold, the second data point is labeled as anomalous. Finally, computing resources are reallocated, based on at least one of the anomalous data points.
Opening claim text (preview).
We claim: 1. A method comprising: receiving a first data point and a second data point in a data stream; identifying a first plurality of data points from the data stream, wherein the first plurality of data points corresponds to a timestamp associated with the first data point and a timestamp associated with the second data point; generating at least a first cluster for the first plurality of data points based on a predefined cluster radius; upon determining that the first data point is outside of the first cluster, labeling the first data point as anomalous; generating a predicted value for the second data point, based on processing data points in the first cluster using a first machine learning model; computing a deviation between the predicted value for the second data point and an actual value for the second data point; upon determining that the deviation exceeds a first predefined threshold, labeling the second data point as anomalous; and facilitating reallocation of computing resources, based on at least one of (i) labeling the first data point as anomalous, or (ii) labeling the second data point as anomalous. 2. The method of claim 1 , the method further comprising cleansing the data stream prior to generating the first cluster by: analyzing a timestamp associated with each of the first plurality of data points to identify a missing data point; generating a predicted value for the missing data point using one or more machine learning models; and inserting a new data point, with the predicted value, into to the first plurality of data points, wherein the new data point is associated with a timestamp corresponding to the missing data point. 3. The method of claim 2 , the method further comprising smoothing the data stream prior to generating the first cluster, wherein smoothing the first plurality of data points comprises, for at least one respective data point in the first plurality of data points: determining a respective rate of change between a value of the respective data point and a value of a data point that immediately precedes the respective data point in time; and upon determining that the respective rate of change exceeds a second predefined threshold, removing the respective data point from the first plurality of data points. 4. The method of claim 1 , wherein facilitating reallocation of computing resources comprises: upon labeling the first data point as anomalous, providing an indication of the first data point to an administrator; and upon labeling the second data point as anomalous, providing an indication of the second data point to an administrator. 5. The method of claim 1 , wherein identifying the first plurality of data points from the data stream comprises identifying one or more data points associated with a time of day matching a time of day associated with the first data point, wherein at least two of the first plurality of data points were collected on different days. 6. The method of claim 5 , wherein the different days are defined based on a predefined interval between days. 7. The method of claim 1 , wherein the deviation is a percentage, and wherein computing the deviation between the predicted value for the second data point and the actual value for the second data point comprises: determining a first value corresponding to a difference between the predicted value and the actual value; and computing the deviation by dividing the first value by the actual value. 8. The method of claim 1 , wherein the first plurality of data points were collected from a plurality of network devices, and wherein each respective data point of the first plurality of data points specifies a respective value for a first key performance indicator (KPI) for a respective network device of the plurality of network devices. 9. A computer program product comprising: a computer-readable storage medium having computer-readable program code embodied therewith, the computer-readable program code executable by one or more computer processors to perform an operation comprising: receiving a first data point and a second data point in a data stream; identifying a first plurality of data points from the data stream, wherein the first plurality of data points corresponds to a timestamp associated with the first data point and a timestamp associated with the second data point; generating at least a first cluster for the first plurality of data points based on a predefined cluster radius; upon determining that the first data point is outside of the first cluster, labeling the first data point as anomalous; generating a predicted value for the second data point, based on processing data points in the first cluster using a first machine learning model; computing a deviation between the predicted value for the second data point and an actual value for the second data point; upon determining that the deviation exceeds a first predefined threshold, labeling the second data point as anomalous; and facilitating reallocation of computing resources, based on at least one of (i) labeling the first data point as anomalous, or (ii) labeling the second data point as anomalous. 10. The computer program product of claim 9 , the operation further comprising cleansing the data stream prior to generating the first cluster by: analyzing a timestamp associated with each of the first plurality of data points to identify a missing data point; generating a predicted value for the missing data point using one or more machine learning models; and inserting a new data point, with the predicted value, into to the first plurality of data points, wherein the new data point is associated with a timestamp corresponding to the missing data point. 11. The computer program product of claim 10 , the operation further comprising smoothing the data stream prior to generating the first cluster, wherein smoothing the first plurality of data points comprises, for at least one respective data point in the first plurality of data points: determining a respective rate of change between a value of the respective data point and a value of a data point that immediately precedes the respective data point in time; and upon determining that the respective rate of change exceeds a second predefined threshold, removing the respective data point from the first plurality of data points. 12. The computer program product of claim 10 , the operation further comprising smoothing the data stream prior to generating the first cluster, wherein smoothing the first plurality of data points comprises, for at least one respective data point in the first plurality of data points: determining a respective rate of change between a value of the respective data point and a value of a data point that immediately precedes the respective data point in time; and upon determining that the respective rate of change exceeds a second predefined threshold, removing the respective data point from the first plurality of data points. 13. The computer program product of claim 9 , wherein facilitating reallocation of computing resources comprises: upon labeling the first data point as anomalous, providing an indication of the first data point to an administrator; and upon labeling the second data point as anomalous, providing an indication of the second data point to an administrator. 14. The computer program product of claim 9 , wherein the deviation is a percentage, and wherein computing the deviation between the predicted value for the second data point and the actual value for the second data point comprises: determining a first value corresponding to a difference bet
Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection · CPC title
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Supervised learning · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.