Method for deriving variable importance on case level for predictive modeling techniques

US10867249B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10867249-B1
Application numberUS-201715474820-A
CountryUS
Kind codeB1
Filing dateMar 30, 2017
Priority dateMar 30, 2017
Publication dateDec 15, 2020
Grant dateDec 15, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are disclosed herein for determining variable importance on a predictive model on a case level. Modeling data associated with a case is received. The modeling data provides input variables, each having a corresponding value for input to a predictive modeling technique associated with the case. A measure of impact for each of the variables is determined using an input shuffling method. Variables having a measure of impact that exceeds a specified threshold are identified. A summary that includes the identified variables is generated.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for determining variable importance on a predictive model on a case level, the method comprising: obtaining, at a predictive modeling application, modeling data that includes: a plurality of variables, wherein each variable of the plurality of variables has a corresponding value in each case for input to a predictive model; training data for training the predictive model; and validating data for validating the predictive model; generating via the predictive modeling application the predictive model based on the training data and the validating data; generating a first set of scores based on the modeling data and the predictive model; invoking a variable analysis tool of the predictive modeling application to generate a measure of impact for each variable of the modeling data, wherein for each of the plurality of variables: randomly resampling via bootstrap sampling the corresponding value of the variable in each case, wherein the random resampling includes identifying: a variable type, or a range of numeric values corresponding to the variable, generating a second set of scores based on implementing the predictive model on the modeling data having the resampled corresponding value of the variable, and determining the measure of impact based on a standard deviation between the first set of scores and the second set of scores, wherein the measure of impact indicates a likelihood that the variable affects scores generated by the predictive modeling technique; identifying one or more of the plurality of variables having a measure of impact that exceeds a specified threshold; and generating a summary including at least the identified one or more of the plurality of variables that exceed the specified threshold by: creating a markup language file, and populating the markup language file with the identified one or more of the plurality of variables. 2. The method of claim 1 , further comprising: receiving a specification of at least a first and a second of the plurality of variables. 3. The method of claim 2 , further comprising: randomly resampling the corresponding values of the at least the first and second variables; performing the predictive model on the modeling data having the resampled corresponding values to obtain a third set of scores; and determining a measure of impact based on a standard deviation between the first set of scores and the third set of scores, wherein the measure of impact indicates a likelihood that the at least the first and second variables affect scores generated by the predictive model. 4. The method of claim 3 , wherein the resampling is performed using a bootstrap method using a range of values identified from training data included in the modeling data. 5. The method of claim 1 , further comprising: outputting the summary to a user interface for user access. 6. The method of claim 1 , wherein the summary includes at least a visualization of the identified one of more plurality of variables that exceed the specified threshold. 7. The method of claim 1 , wherein the predictive model is one of at least a regression modeling technique or a classification modeling technique. 8. A non-transitory computer-readable storage medium storing instructions, which, when executed by a processor, performs an operation for determining variable importance on a predictive model on a case level, the operation comprising: obtaining, at a predictive modeling application, modeling data that includes: a plurality of variables, wherein each variable of the plurality of variables has a corresponding value in each case for input to a predictive model; training data for training the predictive model; and validating data for validating the predictive model; generating via the predictive modeling application the predictive model based on the training data and the validating data; generating a first set of scores based on the modeling data and the predictive model; invoking a variable analysis tool of the predictive modeling application to generate a measure of impact for each variable of the modeling data, wherein for each of the plurality of variables: randomly resampling via bootstrap sampling the corresponding value of the variable in each case, wherein the random resampling includes identifying: a variable type, or a range of numeric values corresponding to the variable, generating a second set of scores based on implementing the predictive model on the modeling data having the resampled corresponding value of the variable, and determining the measure of impact based on a standard deviation between the first set of scores and the second set of scores, wherein the measure of impact indicates a likelihood that the variable affects scores generated by the predictive modeling technique; identifying one or more of the plurality of variables having a measure of impact that exceeds a specified threshold; and generating a summary including at least the identified one or more of the plurality of variables that exceed the specified threshold by: creating a markup language file, and populating the markup language file with the identified one or more of the plurality of variables. 9. The non-transitory computer-readable storage medium of claim 8 , wherein the operation further comprises: receiving a specification of at least a first and a second of the plurality of variables. 10. The non-transitory computer-readable storage medium of claim 9 , wherein the operation further comprises: randomly resampling the corresponding values of the at least the first and second variables; performing the predictive model on the modeling data having the resampled corresponding values to obtain a third set of scores; and determining a measure of impact based on a standard deviation between the first set of scores and the third set of scores, wherein the measure of impact indicates a likelihood that the at least the first and second variables affect scores generated by the predictive model. 11. The non-transitory computer-readable storage medium of claim 10 , wherein the resampling is performed using a bootstrap method using a range of values identified from training data included in the modeling data. 12. The non-transitory computer-readable storage medium of claim 8 , wherein the operation further comprises: outputting the summary to a user interface for user access. 13. The non-transitory computer-readable storage medium of claim 8 , wherein the summary includes at least a visualization of the identified one of more plurality of variables that exceed the specified threshold. 14. The non-transitory computer-readable storage medium of claim 8 , wherein the predictive model is one of at least a regression modeling technique or a classification modeling technique. 15. A system, comprising: one or more processors; and a memory storing program code, which, when executed by the one or more processors, perform an operation for determining variable importance on a predictive model on a case level, the operation comprising: obtaining, at a predictive modeling application, modeling data that includes: a plurality of variables, wherein each variable of the plurality of variables has a corresponding value in each case for input to generate a predictive model; training data for training the predictive model; and validating data for validating the predictive model; generating via the predictive modeling application the predictive model based on the training data and the validating data; generating a first set of scores bas

Assignees

Inventors

Classifications

  • G06N20/00Primary

    Machine learning · CPC title

  • G06N5/02Primary

    Knowledge representation; Symbolic representation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10867249B1 cover?
Techniques are disclosed herein for determining variable importance on a predictive model on a case level. Modeling data associated with a case is received. The modeling data provides input variables, each having a corresponding value for input to a predictive modeling technique associated with the case. A measure of impact for each of the variables is determined using an input shuffling method…
Who is the assignee on this patent?
Intuit Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 15 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).