Systems and techniques for predictive data analytics

US2016335550A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016335550-A1
Application numberUS-201615217626-A
CountryUS
Kind codeA1
Filing dateJul 22, 2016
Priority dateMay 23, 2014
Publication dateNov 17, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and techniques for predictive data analytics are described. In a method for selecting a predictive model for a prediction problem, the suitabilities of predictive modeling procedures for the prediction problem may be determined based on characteristics of the prediction problem and/or on attributes of the respective modeling procedures. A subset of the predictive modeling procedures may be selected based on the determined suitabilities of the selected modeling procedures for the prediction problem. A resource allocation schedule allocating computational resources for execution of the selected modeling procedures may be generated, based on the determined suitabilities of the selected modeling procedures for the prediction problem. Results of the execution of the selected modeling procedures in accordance with the resource allocation schedule may be obtained. A predictive model for the prediction problem may be selected based on those results.

First claim

Opening claim text (preview).

What is claimed is: 1 - 30 . (canceled) 31 . A predictive modeling apparatus comprising: a memory configured to store a machine-executable module encoding a predictive modeling procedure, wherein the predictive modeling procedure includes a plurality of tasks, wherein the machine-executable module includes a directed graph representing dependencies between the tasks, and wherein the plurality of tasks includes at least one pre-processing task, at least one model-fitting task, and at least one post-processing task; and at least one processor configured to execute the machine-executable module, wherein executing the machine-executable module causes the apparatus to perform the predictive modeling procedure, including: manipulating input data, comprising performing the pre-processing task on the input data; performing the model-fitting task, comprising: generating, from the pre-processed input data, training data and testing data, fitting a predictive model to the training data, and testing the fitted model on the testing data; and performing the post-processing task. 32 . The predictive modeling apparatus of claim 31 , wherein the pre-processed input data comprise at least one data set, wherein generating the training data comprises obtaining a first subset of the data set, and wherein generating the testing data comprises obtaining a second subset of the data set. 33 . The predictive modeling apparatus of claim 32 , wherein performing the predictive modeling procedure further includes performing cross-validation of the predictive model. 34 . The predictive modeling apparatus of claim 33 , wherein the training data are first training data, wherein the testing data are first testing data, wherein the fitted model is a first fitted model, and wherein performing the cross-validation of the predictive model comprises: (a) generating, from the data set, second training data and second testing data, wherein generating the second training data comprises obtaining a third subset of the data set, and wherein generating the second testing data comprises obtaining a fourth subset of the data set; (b) fitting the predictive model to the second training data to obtain a second fitted model; and (c) testing the second fitted model on the second testing data. 35 . The predictive modeling apparatus of claim 34 , wherein performing the model-fitting task further comprises partitioning the data set into a plurality of partitions including at least a first partition and a second partition. 36 . The predictive modeling apparatus of claim 35 , wherein partitioning the data set into a plurality of partitions comprises randomly assigning each data item in the data set to a respective partition. 37 . The predictive modeling apparatus of claim 35 , wherein: the first training data comprise the first partition of the data set; the first testing data comprise all of the partitions of the data set except the first partition; the second training data comprise the second partition of the data set; and the second testing data comprise all of the partitions of the data set except the second partition. 38 . The predictive modeling apparatus of claim 35 , wherein: the first training data comprise a subset of the first partition of the data set; the first testing data comprise respective subsets of all of the partitions of the data set except the first partition; the second training data comprise a subset of the second partition of the data set; and the second testing data comprise respective subsets of all of the partitions of the data set except the second partition. 39 . The predictive modeling apparatus of claim 34 , wherein: the pre-processed input data comprise a first partition and a second partition, the data set comprises the first partition of the pre-processed input data, and performing the model-fitting task further comprises testing the first and second fitted models on holdout data comprising the second partition of the pre-processed input data. 40 . The predictive modeling apparatus of claim 39 , wherein no predictive model is fitted to the holdout data. 41 . The predictive modeling apparatus of claim 31 , wherein performing the predictive modeling procedure further includes performing nested cross-validation of the predictive model. 42 . The predictive modeling apparatus of claim 41 , wherein: the pre-processed input data comprise at least one data set; performing the nested cross-validation of the predictive model comprises: partitioning the data set into a first plurality of partitions of the data set including at least a first partition of the data set and a second partition of the data set, and partitioning the first partition of the data set into a plurality of partitions of the first partition of the data set including at least a first partition of the first partition of the data set and a second partition of the first partition of the data set; the training data comprise the first partition of the first partition of the data set; and the testing data comprise all of the partitions of the first partition of the data set except the first partition of the first partition of the data set. 43 . The predictive modeling apparatus of claim 42 , wherein the training data are first training data, the testing data are first testing data, the fitted model is a first fitted model, and performing the nested cross-validation of the predictive model further comprises: (a) generating, from the first partition of the data set, second training data and second testing data, wherein the second training data comprise the second partition of the first partition of the data set, and wherein the second testing data comprise a plurality of the partitions of the first partition of the data set other than the second partition of the first partition of the data set; (b) fitting the predictive model to the second training data to obtain a second fitted model; and (c) testing the second fitted model on the second testing data. 44 . The predictive modeling apparatus of claim 43 , wherein performing the nested cross-validation further includes: testing the first fitted model and the second fitted model on the second partition of the data set; and comparing the first fitted model to the second fitted model based on results of testing the first and second fitted models on the second partition of the data set. 45 . The predictive modeling apparatus of claim 32 , wherein the predictive model is a first type of predictive model, the fitted model is a first fitted model, the model-fitting task is a first model-fitting task, and performing the predictive modeling procedure further includes performing a second model-fitting task using a second type of predictive model. 46 . The predictive modeling apparatus of claim 45 , wherein the training data are first training data, wherein the testing data are first testing data, and wherein performing the second model-fitting task using the second type of predictive model comprises: (a) generating, from the data set, second training data and second testing data, wherein generating the second training data comprises obtaining a third subset of the data set, and wherein generating the second testing data comprises generating a fourth subset of the data set; (b) fitting the second type of predictive model to the second training data to obtain a second fitted model; and (c) testing the second fitted model on the second testing data. 47 . The predictive modeling apparatus of claim 46 , w

Assignees

Inventors

Classifications

  • Databases characterised by their database models, e.g. relational or object models · CPC title

  • Physics · mapped topic

  • G06N5/04Primary

    Inference or reasoning models · CPC title

  • Machine learning · CPC title

  • G06Q10/04Primary

    Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem" (market predictions or forecasting for commercial activities G06Q30/0202) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016335550A1 cover?
Systems and techniques for predictive data analytics are described. In a method for selecting a predictive model for a prediction problem, the suitabilities of predictive modeling procedures for the prediction problem may be determined based on characteristics of the prediction problem and/or on attributes of the respective modeling procedures. A subset of the predictive modeling procedures may…
Who is the assignee on this patent?
Datarobot Inc
What technology area does this patent fall under?
Primary CPC classification G06N5/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Nov 17 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).