Systems for second-order predictive data analytics, and related methods and apparatus
US-2018060744-A1 · Mar 1, 2018 · US
US10366346B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10366346-B2 |
| Application number | US-201615331797-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 21, 2016 |
| Priority date | May 23, 2014 |
| Publication date | Jul 30, 2019 |
| Grant date | Jul 30, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for determining the predictive value of a feature may include: (a) performing predictive modeling procedures associated with respective predictive models, wherein performing each modeling procedure includes fitting the associated model to an initial dataset representing an initial prediction problem; (b) determining a first accuracy score of each of the fitted models, representing an accuracy with which the fitted model predicts an outcome of the initial prediction problem; (c) shuffling values of a feature across observations included in the initial dataset, thereby generating a modified dataset representing a modified prediction problem; (d) determining a second accuracy score of each of the fitted models, representing an accuracy with which the fitted model predicts an outcome of the modified prediction problem; and (e) determining a model-specific predictive value of the feature for each of the fitted models based on the first and second accuracy scores of the fitted model.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for building a predictive model, comprising: determining a multi-model predictive value of a feature of an initial dataset representing a prediction problem, wherein the initial dataset includes a plurality of observations and each observation includes respective values for a plurality of features, including: (a) performing one or more predictive modeling procedures, wherein each of the predictive modeling procedures is associated with a different type of predictive model, wherein performing each modeling procedure comprises fitting the associated predictive model to the initial dataset; (b) reducing the multi-model predictive value of the feature by shuffling values of the feature across respective observations included in the initial dataset, thereby generating a modified dataset; (c) for each of the fitted predictive models: (c1) determining a first accuracy score representing an accuracy with which the fitted model generates predictions for data in the initial dataset; (c2) determining a second accuracy score representing an accuracy with which the fitted model generates predictions for data in the modified dataset in which the multi-model predictive value of the feature has been reduced; and (c3) determining a model-specific predictive value of the feature based on the first and second accuracy scores of the fitted model; and (d) determining, based on the model-specific predictive values of the feature, that the multi-model predictive value of the feature is low; performing feature engineering on the initial dataset based on the multi-model predictive value of the feature, including pruning the feature having the low multi-model predictive value from the initial dataset, thereby generating a pruned dataset; and building a predictive model for the prediction problem, including: performing a plurality of predictive modeling procedures on the pruned dataset, selecting a fitted predictive model generated by the plurality of predictive modeling procedures, and deploying the selected predictive model to predict outcomes of the prediction problem without using the pruned feature. 2. The method of claim 1 , further comprising: prior to performing the one or more predictive modeling procedures, selecting the one or more predictive modeling procedures based on characteristics of the initial dataset, characteristics of the prediction problem, and/or characteristics of the feature. 3. The method of claim 1 , wherein the one or more predictive modeling procedures comprise two or more modeling procedures selected from the group consisting of a random forest modeling procedure, a generalized additive modeling procedure, and a support vector machine modeling procedure. 4. The method of claim 1 , wherein the one or more predictive modeling procedures comprise a first modeling procedure selected from a first family of modeling procedures and a second modeling procedure selected from a second family of modeling procedures. 5. The method of claim 1 , further comprising: prior to determining the second accuracy scores of the one or more predictive models, refitting the one or more predictive models to the modified dataset. 6. The method of claim 1 , wherein the determined model-specific predictive value of the feature for a particular fitted model increases as the difference between the first accuracy score and the second accuracy score of the particular fitted model increases. 7. The method of claim 1 , wherein the determined model-specific predictive value of the feature for a particular fitted model comprises a percentage difference between the first accuracy score and the second accuracy score of the particular fitted model, relative to the first accuracy score of the particular fitted model. 8. The method of claim 1 , further comprising determining a model-independent predictive value of the feature. 9. The method of claim 8 , wherein determining the model-independent predictive value of the feature comprises calculating a statistical measure of a center and/or a spread of a plurality of model-specific predictive values of the feature. 10. The method of claim 9 , wherein determining the model-independent predictive value of the feature comprises calculating the statistical measure of the center of the plurality of model-specific predictive values, and wherein the statistical measure of the center is selected from the group consisting of a mean, a median, and a mode of the plurality of model-specific predictive values. 11. The method of claim 9 , wherein determining the model-independent predictive value of the feature comprises calculating the statistical measure of the spread of the plurality of model-specific predictive values, and wherein the statistical measure of the spread is selected from the group consisting of a range, a variance, and a standard deviation of the plurality of model-specific predictive values. 12. The method of claim 8 , wherein determining the model-independent predictive value of the feature comprises calculating a combination of the plurality of model-specific predictive values of the feature. 13. The method of claim 12 , wherein calculating a combination of the plurality of model-specific predictive values comprises calculating a weighted combination of the plurality of model-specific predictive values. 14. The method of claim 13 , wherein calculating the weighted combination of the plurality of model-specific predictive values comprises assigning respective weights to the plurality of model-specific predictive values, wherein the weight assigned to a particular model-specific predictive value corresponding to a particular fitted predictive model increases as the first accuracy score of the fitted predictive model increases. 15. The method of claim 1 , wherein the feature is a first feature, and wherein the method further comprises: determining a multi-model predictive value of a second feature of the initial dataset representing the prediction problem, including: (bb) reducing the multi-model predictive value of the second feature by shuffling values of the second feature across respective observations included in the initial dataset, thereby generating a second modified dataset; (cc) for each of the fitted predictive models: (cc2) determining a third accuracy score representing an accuracy with which the fitted model generates predictions for data in the second modified dataset in which the multi-model predictive value of the second feature has been reduced; and (cc3) determining a model-specific predictive value of the second feature based on the first and second accuracy scores of the fitted model; and (dd) determining, based on the model-specific predictive values of the second feature, a multi-model predictive value of the feature. 16. The method of claim 1 , wherein the feature is a first feature, wherein the pruned dataset includes a plurality of second features, and wherein the method further comprises determining model-specific predictive values of the second features of the pruned dataset by performing steps (b), (c2), and (c3) for each of the second features. 17. The method of claim 16 , further comprising displaying, via a graphical user interface, graphical content identifying the second features of the pruned dataset and the model-specific predictive values of the second features. 18. The method of claim 16 , wherein the one or more predictive modeling procedures are first modeling procedures including a particular modeling procedure associated with a partic
Market modelling; Market analysis; Collecting market data · CPC title
Enterprise or organisation modelling · CPC title
Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem" (market predictions or forecasting for commercial activities G06Q30/0202) · CPC title
Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.