Selecting forecasting models for time series using state space representations

US10318874B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10318874-B1
Application numberUS-201514662021-A
CountryUS
Kind codeB1
Filing dateMar 18, 2015
Priority dateMar 18, 2015
Publication dateJun 11, 2019
Grant dateJun 11, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Corresponding to each forecasting model of a family of related models for a time series sequence, a respective state space representation is generated. One or more cross-validation iterations are then executed for each model of the family. In a given iteration, a training variant of the time series sequence is generated, with a subset of the time series sequence entries replaced by representations of missing values. Predictions for the missing values are obtained using the state space representation and the training variant, and a model quality metric is obtained based on prediction errors. The optimal model of the family is selected using the model quality metrics obtained from the cross validation iterations.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a machine learning service comprising a plurality of computing devices configured to: generate a model family comprising a plurality of forecasting models that implement a particular modeling methodology for a time series sequence comprising a set of observations, wherein individual ones of the forecasting models of the model family have a different set of model parameters than other forecasting models of the model family; generate respective state space representations corresponding to individual ones of the forecasting models of the model family; implement, using individual ones of the state space representations, respective sets of cross-validation iterations, wherein a particular set of cross-validation iterations corresponds to a respective forecasting model of the model family, and wherein the particular set of cross-validation iterations includes operations to: determine a particular training variant of the time series sequence, wherein the particular training variant includes (a) in positions indexed by a particular training index vector associated with the particular set of cross-validation iterations, copies of the corresponding observations of the time series sequence and (b) in positions indexed by a test index vector corresponding to the particular training index vector, representations of missing values; utilize the particular training variant as input for a particular state space representation corresponding to the respective forecasting model to obtain a prediction set corresponding to the test index vector; and compute a model quality metric for the respective forecasting model, based at least in part on differences between the predictions of the prediction set and the observations of the time series sequence indexed by the test index vector; automatically select, based at least in part on a comparison of model quality metrics determined for individual forecasting models of the model family, a particular forecasting model as an optimal forecasting model among the models of the model family; wherein said implement, using individual ones of the state space representations, of the respective sets of cross-validation iterations prior to said automatically select the particular forecasting model as the optimal forecasting model avoids training a model that is over fitted to the original data, thereby avoiding use of additional computational resources associated with training the model; and utilize the optimal forecasting model to generate one or more forecasts corresponding to the time series sequence comprising a real-time data stream of observations collected by one or more scientific instruments or sensors in a seismically sensitive zone, satellite, automobile, train or airplane. 2. The system as recited in claim 1 , wherein the particular state space representation is expressed in innovations form. 3. The system as recited in claim 1 , wherein the particular set of cross-validation iterations comprises a K-fold cross-validation procedure, in which K different training variants of TSS are generated, K initial measures of model quality are generated, and the model quality metric is obtained by aggregating the K initial quality metrics. 4. The system as recited in claim 1 , wherein the particular modeling methodology comprises one or more of: (a) autoregressive modeling, (b) moving average modeling, (c) seasonal modeling, (d) periodic modeling, (e) regression modeling, or (f) exponential smoothing. 5. The system as recited in claim 1 , wherein the model quality metric for the respective forecasting model comprises one or more of: (a) a likelihood metric, (b) a one-step ahead mean squared forecast error (1-MSFE) metric, (c) a k-step ahead mean squared forecast error (k-MSFE) metric, or (d) a one-step ahead mean absolute forecast percentage error (1-MAFPE). 6. A method, comprising: performing, by one or more computing devices: generating respective state space representations of a plurality of forecasting models of a model family, wherein individual ones of the plurality of forecasting models utilize a particular modeling methodology for a time series sequence comprising a plurality of observations; implementing, using individual ones of the state space representations, respective sets of cross-validation iterations, wherein a particular set of cross-validation iterations corresponds to a respective forecasting model of the model family and includes: identifying a test subset and a training subset of the plurality of observations of the time series sequence; obtaining, using a variant of the time series sequence as input to a particular state space representation corresponding to the respective forecasting model, predictions for the test subset, wherein within the variant, the test subset is replaced by missing values; and computing a model quality metric for the respective forecasting model based at least in part on differences between the predictions and the test subset; and automatically selecting, based at least in part on a comparison of model quality metrics determined for individual forecasting models of the model family, a particular forecasting model as an optimal forecasting model among the models of the model family; wherein said implementing, using individual ones of the state space representations, of the respective sets of cross-validation iterations prior to said automatically selecting the particular forecasting model as the optimal forecasting model avoids training a model that is over fitted to the original data, thereby avoiding use of additional computational resources associated with training the model. 7. The method as recited in claim 6 , further comprising performing, by the one or more computing devices: utilizing the optimal forecasting model for one or more of: (a) generating one or more forecasts corresponding to the time series sequence, (b) determining relative impacts, on an output variable generated by the optimal forecasting model, of a first and a second parameter of the optimal forecasting model, (c) determining an impact, on an output variable generated by the optimal forecasting model, of varying a value of a particular parameter of the optimal forecasting model, or (d) determining relative impacts, on an output variable generated by the optimal forecasting model, of a first and a second input variable of the optimal forecasting model. 8. The method as recited in claim 6 , wherein a first forecasting model of the model family differs from another forecasting model of the model family in one or more of: (a) an initial value parameter, (b) a regularization parameter, or (c) a number of model parameters. 9. The method as recited in claim 6 , wherein the particular state space representation is expressed in innovations form. 10. The method as recited in claim 6 , wherein the particular set of cross-validation iterations comprises a K-fold cross-validation procedure. 11. The method as recited in claim 6 , wherein the particular set of cross-validation iterations comprises an exhaustive leave-p-out cross-validation procedure. 12. The method as recited in claim 6 , wherein the particular modeling methodology comprises one or more of: (a) autoregressive modeling, (b) moving average modeling, (c) seasonal modeling, (d) periodic modeling, (e) regression modeling, or (f) exponential smoothing. 13. The method as recited in claim 6 , wherein the model quality metric comprises one or more of: (a) a likelihood metric, (b) a one-step ahead mean squared forecast error (1-MSFE) metric, (c) a k-step ahead mean squared forecast error (k-MSFE) metric, or (d) a one-step ahea

Assignees

Inventors

Classifications

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • Combinations of networks · CPC title

  • Physics · mapped topic

  • G06N5/04Primary

    Inference or reasoning models · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10318874B1 cover?
Corresponding to each forecasting model of a family of related models for a time series sequence, a respective state space representation is generated. One or more cross-validation iterations are then executed for each model of the family. In a given iteration, a training variant of the time series sequence is generated, with a subset of the time series sequence entries replaced by representati…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06N5/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 11 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).