Anomaly detection based on relationships between multiple time series

US10375098B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10375098-B2
Application numberUS-201715420737-A
CountryUS
Kind codeB2
Filing dateJan 31, 2017
Priority dateJan 31, 2017
Publication dateAug 6, 2019
Grant dateAug 6, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In some implementations, sequences of time series values determined from machine data are obtained. Each sequence corresponds to a respective time series. A plurality of predictive models is generated for a first time series from the sequences of time series values. Each predictive model is to generate predicted values associated with the first time series using values of a second time series. For each of the plurality of predictive models, an error is determined between the corresponding predicted values and values associated with the first time series. A predictive model is selected for anomaly detection based on the determined error of the predictive model. Transmission is caused of an indication of an anomaly detected using the selected predictive model.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: obtaining sequences of time series values determined from raw machine data, each sequence corresponding to a respective time series, wherein the raw machine data is produced by one or more components within an information technology or security environment and reflects activity within the information technology or security environment; generating a plurality of predictive models for a first time series from the sequences of time series values, each predictive model to generate predicted values associated with the first time series using time series values corresponding to a second time series; evaluating one or more characteristics of the plurality of predictive models; and automatically selecting a predictive model from the plurality of predictive models for anomaly detection based on the evaluating of the one or more characteristics. 2. The method of claim 1 , further comprising applying the selected predictive model to subsequently received time series values to detect an anomaly. 3. The method of claim 1 , further comprising in response to detection of an anomaly using the selected predictive mode, causing transmission of an indication of the anomaly. 4. The method of claim 1 , wherein the evaluating comprises, for each of the plurality of predictive models, determining an error between the corresponding predicted values and values associated with the first time series, wherein the selecting of the predictive model is based on the error of the predictive model. 5. The method of claim 1 , wherein the sequences of time series values correspond to at least one of performance metrics or security-related metrics. 6. The method of claim 1 , wherein a first of the plurality of predictive models corresponds to a different set of time series than a second of the plurality of predictive models. 7. The method of claim 1 , wherein a first of the plurality of predictive models is a polynomial model, a second of the predictive models is a neural network, and a third predictive model is a decision tree model. 8. The method of claim 1 , wherein the obtaining of the sequences of time series values is from one or more streams of time series data. 9. The method of claim 1 , wherein the one or more characteristics correspond to residuals between the predicted values of one or more of the plurality of predictive models and time series values associated with the first time series. 10. The method of claim 1 , wherein the one or more characteristics are based on a first explanatory value associated with a model type of a first of the plurality of predictive models and a second explanatory value associated with a different model type of a second of the plurality of predictive models. 11. The method of claim 1 comprising: training a subset of the plurality of predictive models based on the one or more characteristics; and evaluating the subset of the plurality of predictive models based on the training, wherein the predictive model is selected for the anomaly detection from the subset based on the evaluating. 12. The method of claim 1 comprising: training the plurality of predictive models using first portions of the sequences of time series values corresponding to a training period; and evaluating second portions of the sequences of time series values corresponding to a prediction period, wherein the one or more characteristics are based on the evaluated second portions. 13. The method of claim 1 , further comprising clustering the sequences of time series values into a plurality of clusters, wherein the first time series corresponds to a representative time series of a first of the plurality of clusters and the second time series corresponds to at least one time series in a second cluster of the plurality of clusters. 14. The method of claim 1 , wherein the obtaining of the sequences of time series values is responsive to a user interaction associated with the sequences of the series values. 15. The method of claim 1 , wherein the values of the second time series correspond to events, each event comprising a time stamp and a portion of raw data. 16. The method of claim 1 , wherein each data point of the second time series is associated with a respective time stamp of a respective event. 17. The method of claim 1 , generating the time series values from event data using a late-binding schema. 18. The method of claim 1 , wherein the predicted values are associated with later times than the values of the second time series used to generate the predictive model. 19. The method of claim 1 , further comprising causing an explanatory message to be presented based on the anomaly detection, the explanatory message indicating a predicted relationship corresponding to the predictive model and an observed relationship associated with an anomaly. 20. The method of claim 1 , wherein the selecting is of multiple models from the plurality of predictive models based on the evaluating of the one or more characteristics, and the anomaly is detected using the multiple models. 21. One or more non-transitory computer-readable storage media having instructions stored thereon, wherein the instructions, when executed by one or more processors, cause the one or more processors to perform a computer-implemented method comprising: obtaining sequences of time series values determined from raw machine data, each sequence corresponding to a respective time series, wherein the raw machine data is produced by one or more components within an information technology or security environment and reflects activity within the information technology or security environment; generating a plurality of predictive models for a first time series from the sequences of time series values, each predictive model to generate predicted values associated with the first time series using time series values corresponding to a second time series; evaluating one or more characteristics of the plurality of predictive models; and automatically selecting a predictive model from the plurality of predictive models for anomaly detection based on the evaluating of the one or more characteristics. 22. The non-transitory one or more computer-readable storage media of claim 21 , wherein the evaluating comprises, for each of the plurality of predictive models, determining an error between the corresponding predicted values and values associated with the first time series, wherein the selecting of the predictive model is based on the error of the predictive model. 23. The non-transitory one or more computer-readable storage media of claim 21 , wherein the sequences of time series values correspond to at least one of performance metrics, security-related metrics, industrial metrics, behavioral data, or transactional metrics. 24. The non-transitory one or more computer-readable storage media of claim 21 , wherein a first of the plurality of predictive models corresponds to a different set of time series than a second of the plurality of predictive models. 25. The non-transitory one or more computer-readable storage media of claim 21 , wherein a first of the plurality of predictive models is a polynomial model, a second of the predictive models is a neural network, and a third predictive model is a decision tree model. 26. A computer-implemented system comprising: one or more processors; one or more non-transitory

Assignees

Inventors

Classifications

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • Learning methods · CPC title

  • Traffic logging, e.g. anomaly detection · CPC title

  • Architecture, e.g. interconnection topology · CPC title

  • Supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10375098B2 cover?
In some implementations, sequences of time series values determined from machine data are obtained. Each sequence corresponds to a respective time series. A plurality of predictive models is generated for a first time series from the sequences of time series values. Each predictive model is to generate predicted values associated with the first time series using values of a second time series. …
Who is the assignee on this patent?
Splunk Inc
What technology area does this patent fall under?
Primary CPC classification H04L63/1425. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Aug 06 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).