Systems and techniques for determining the predictive value of a feature

US10366346B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10366346-B2
Application numberUS-201615331797-A
CountryUS
Kind codeB2
Filing dateOct 21, 2016
Priority dateMay 23, 2014
Publication dateJul 30, 2019
Grant dateJul 30, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for determining the predictive value of a feature may include: (a) performing predictive modeling procedures associated with respective predictive models, wherein performing each modeling procedure includes fitting the associated model to an initial dataset representing an initial prediction problem; (b) determining a first accuracy score of each of the fitted models, representing an accuracy with which the fitted model predicts an outcome of the initial prediction problem; (c) shuffling values of a feature across observations included in the initial dataset, thereby generating a modified dataset representing a modified prediction problem; (d) determining a second accuracy score of each of the fitted models, representing an accuracy with which the fitted model predicts an outcome of the modified prediction problem; and (e) determining a model-specific predictive value of the feature for each of the fitted models based on the first and second accuracy scores of the fitted model.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for building a predictive model, comprising: determining a multi-model predictive value of a feature of an initial dataset representing a prediction problem, wherein the initial dataset includes a plurality of observations and each observation includes respective values for a plurality of features, including: (a) performing one or more predictive modeling procedures, wherein each of the predictive modeling procedures is associated with a different type of predictive model, wherein performing each modeling procedure comprises fitting the associated predictive model to the initial dataset; (b) reducing the multi-model predictive value of the feature by shuffling values of the feature across respective observations included in the initial dataset, thereby generating a modified dataset; (c) for each of the fitted predictive models: (c1) determining a first accuracy score representing an accuracy with which the fitted model generates predictions for data in the initial dataset; (c2) determining a second accuracy score representing an accuracy with which the fitted model generates predictions for data in the modified dataset in which the multi-model predictive value of the feature has been reduced; and (c3) determining a model-specific predictive value of the feature based on the first and second accuracy scores of the fitted model; and (d) determining, based on the model-specific predictive values of the feature, that the multi-model predictive value of the feature is low; performing feature engineering on the initial dataset based on the multi-model predictive value of the feature, including pruning the feature having the low multi-model predictive value from the initial dataset, thereby generating a pruned dataset; and building a predictive model for the prediction problem, including: performing a plurality of predictive modeling procedures on the pruned dataset, selecting a fitted predictive model generated by the plurality of predictive modeling procedures, and deploying the selected predictive model to predict outcomes of the prediction problem without using the pruned feature. 2. The method of claim 1 , further comprising: prior to performing the one or more predictive modeling procedures, selecting the one or more predictive modeling procedures based on characteristics of the initial dataset, characteristics of the prediction problem, and/or characteristics of the feature. 3. The method of claim 1 , wherein the one or more predictive modeling procedures comprise two or more modeling procedures selected from the group consisting of a random forest modeling procedure, a generalized additive modeling procedure, and a support vector machine modeling procedure. 4. The method of claim 1 , wherein the one or more predictive modeling procedures comprise a first modeling procedure selected from a first family of modeling procedures and a second modeling procedure selected from a second family of modeling procedures. 5. The method of claim 1 , further comprising: prior to determining the second accuracy scores of the one or more predictive models, refitting the one or more predictive models to the modified dataset. 6. The method of claim 1 , wherein the determined model-specific predictive value of the feature for a particular fitted model increases as the difference between the first accuracy score and the second accuracy score of the particular fitted model increases. 7. The method of claim 1 , wherein the determined model-specific predictive value of the feature for a particular fitted model comprises a percentage difference between the first accuracy score and the second accuracy score of the particular fitted model, relative to the first accuracy score of the particular fitted model. 8. The method of claim 1 , further comprising determining a model-independent predictive value of the feature. 9. The method of claim 8 , wherein determining the model-independent predictive value of the feature comprises calculating a statistical measure of a center and/or a spread of a plurality of model-specific predictive values of the feature. 10. The method of claim 9 , wherein determining the model-independent predictive value of the feature comprises calculating the statistical measure of the center of the plurality of model-specific predictive values, and wherein the statistical measure of the center is selected from the group consisting of a mean, a median, and a mode of the plurality of model-specific predictive values. 11. The method of claim 9 , wherein determining the model-independent predictive value of the feature comprises calculating the statistical measure of the spread of the plurality of model-specific predictive values, and wherein the statistical measure of the spread is selected from the group consisting of a range, a variance, and a standard deviation of the plurality of model-specific predictive values. 12. The method of claim 8 , wherein determining the model-independent predictive value of the feature comprises calculating a combination of the plurality of model-specific predictive values of the feature. 13. The method of claim 12 , wherein calculating a combination of the plurality of model-specific predictive values comprises calculating a weighted combination of the plurality of model-specific predictive values. 14. The method of claim 13 , wherein calculating the weighted combination of the plurality of model-specific predictive values comprises assigning respective weights to the plurality of model-specific predictive values, wherein the weight assigned to a particular model-specific predictive value corresponding to a particular fitted predictive model increases as the first accuracy score of the fitted predictive model increases. 15. The method of claim 1 , wherein the feature is a first feature, and wherein the method further comprises: determining a multi-model predictive value of a second feature of the initial dataset representing the prediction problem, including: (bb) reducing the multi-model predictive value of the second feature by shuffling values of the second feature across respective observations included in the initial dataset, thereby generating a second modified dataset; (cc) for each of the fitted predictive models: (cc2) determining a third accuracy score representing an accuracy with which the fitted model generates predictions for data in the second modified dataset in which the multi-model predictive value of the second feature has been reduced; and (cc3) determining a model-specific predictive value of the second feature based on the first and second accuracy scores of the fitted model; and (dd) determining, based on the model-specific predictive values of the second feature, a multi-model predictive value of the feature. 16. The method of claim 1 , wherein the feature is a first feature, wherein the pruned dataset includes a plurality of second features, and wherein the method further comprises determining model-specific predictive values of the second features of the pruned dataset by performing steps (b), (c2), and (c3) for each of the second features. 17. The method of claim 16 , further comprising displaying, via a graphical user interface, graphical content identifying the second features of the pruned dataset and the model-specific predictive values of the second features. 18. The method of claim 16 , wherein the one or more predictive modeling procedures are first modeling procedures including a particular modeling procedure associated with a partic

Assignees

Inventors

Classifications

  • Market modelling; Market analysis; Collecting market data · CPC title

  • Enterprise or organisation modelling · CPC title

  • Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem" (market predictions or forecasting for commercial activities G06Q30/0202) · CPC title

  • Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10366346B2 cover?
A method for determining the predictive value of a feature may include: (a) performing predictive modeling procedures associated with respective predictive models, wherein performing each modeling procedure includes fitting the associated model to an initial dataset representing an initial prediction problem; (b) determining a first accuracy score of each of the fitted models, representing an a…
Who is the assignee on this patent?
Datarobot Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 30 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).