What technology area does this patent fall under?

Primary CPC classification G06N20/00. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 15 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method for deriving variable importance on case level for predictive modeling techniques

US10867249B1 · US · B1

Patent metadata
Field	Value
Publication number	US-10867249-B1
Application number	US-201715474820-A
Country	US
Kind code	B1
Filing date	Mar 30, 2017
Priority date	Mar 30, 2017
Publication date	Dec 15, 2020
Grant date	Dec 15, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are disclosed herein for determining variable importance on a predictive model on a case level. Modeling data associated with a case is received. The modeling data provides input variables, each having a corresponding value for input to a predictive modeling technique associated with the case. A measure of impact for each of the variables is determined using an input shuffling method. Variables having a measure of impact that exceeds a specified threshold are identified. A summary that includes the identified variables is generated.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for determining variable importance on a predictive model on a case level, the method comprising: obtaining, at a predictive modeling application, modeling data that includes: a plurality of variables, wherein each variable of the plurality of variables has a corresponding value in each case for input to a predictive model; training data for training the predictive model; and validating data for validating the predictive model; generating via the predictive modeling application the predictive model based on the training data and the validating data; generating a first set of scores based on the modeling data and the predictive model; invoking a variable analysis tool of the predictive modeling application to generate a measure of impact for each variable of the modeling data, wherein for each of the plurality of variables: randomly resampling via bootstrap sampling the corresponding value of the variable in each case, wherein the random resampling includes identifying: a variable type, or a range of numeric values corresponding to the variable, generating a second set of scores based on implementing the predictive model on the modeling data having the resampled corresponding value of the variable, and determining the measure of impact based on a standard deviation between the first set of scores and the second set of scores, wherein the measure of impact indicates a likelihood that the variable affects scores generated by the predictive modeling technique; identifying one or more of the plurality of variables having a measure of impact that exceeds a specified threshold; and generating a summary including at least the identified one or more of the plurality of variables that exceed the specified threshold by: creating a markup language file, and populating the markup language file with the identified one or more of the plurality of variables. 2. The method of claim 1 , further comprising: receiving a specification of at least a first and a second of the plurality of variables. 3. The method of claim 2 , further comprising: randomly resampling the corresponding values of the at least the first and second variables; performing the predictive model on the modeling data having the resampled corresponding values to obtain a third set of scores; and determining a measure of impact based on a standard deviation between the first set of scores and the third set of scores, wherein the measure of impact indicates a likelihood that the at least the first and second variables affect scores generated by the predictive model. 4. The method of claim 3 , wherein the resampling is performed using a bootstrap method using a range of values identified from training data included in the modeling data. 5. The method of claim 1 , further comprising: outputting the summary to a user interface for user access. 6. The method of claim 1 , wherein the summary includes at least a visualization of the identified one of more plurality of variables that exceed the specified threshold. 7. The method of claim 1 , wherein the predictive model is one of at least a regression modeling technique or a classification modeling technique. 8. A non-transitory computer-readable storage medium storing instructions, which, when executed by a processor, performs an operation for determining variable importance on a predictive model on a case level, the operation comprising: obtaining, at a predictive modeling application, modeling data that includes: a plurality of variables, wherein each variable of the plurality of variables has a corresponding value in each case for input to a predictive model; training data for training the predictive model; and validating data for validating the predictive model; generating via the predictive modeling application the predictive model based on the training data and the validating data; generating a first set of scores based on the modeling data and the predictive model; invoking a variable analysis tool of the predictive modeling application to generate a measure of impact for each variable of the modeling data, wherein for each of the plurality of variables: randomly resampling via bootstrap sampling the corresponding value of the variable in each case, wherein the random resampling includes identifying: a variable type, or a range of numeric values corresponding to the variable, generating a second set of scores based on implementing the predictive model on the modeling data having the resampled corresponding value of the variable, and determining the measure of impact based on a standard deviation between the first set of scores and the second set of scores, wherein the measure of impact indicates a likelihood that the variable affects scores generated by the predictive modeling technique; identifying one or more of the plurality of variables having a measure of impact that exceeds a specified threshold; and generating a summary including at least the identified one or more of the plurality of variables that exceed the specified threshold by: creating a markup language file, and populating the markup language file with the identified one or more of the plurality of variables. 9. The non-transitory computer-readable storage medium of claim 8 , wherein the operation further comprises: receiving a specification of at least a first and a second of the plurality of variables. 10. The non-transitory computer-readable storage medium of claim 9 , wherein the operation further comprises: randomly resampling the corresponding values of the at least the first and second variables; performing the predictive model on the modeling data having the resampled corresponding values to obtain a third set of scores; and determining a measure of impact based on a standard deviation between the first set of scores and the third set of scores, wherein the measure of impact indicates a likelihood that the at least the first and second variables affect scores generated by the predictive model. 11. The non-transitory computer-readable storage medium of claim 10 , wherein the resampling is performed using a bootstrap method using a range of values identified from training data included in the modeling data. 12. The non-transitory computer-readable storage medium of claim 8 , wherein the operation further comprises: outputting the summary to a user interface for user access. 13. The non-transitory computer-readable storage medium of claim 8 , wherein the summary includes at least a visualization of the identified one of more plurality of variables that exceed the specified threshold. 14. The non-transitory computer-readable storage medium of claim 8 , wherein the predictive model is one of at least a regression modeling technique or a classification modeling technique. 15. A system, comprising: one or more processors; and a memory storing program code, which, when executed by the one or more processors, perform an operation for determining variable importance on a predictive model on a case level, the operation comprising: obtaining, at a predictive modeling application, modeling data that includes: a plurality of variables, wherein each variable of the plurality of variables has a corresponding value in each case for input to generate a predictive model; training data for training the predictive model; and validating data for validating the predictive model; generating via the predictive modeling application the predictive model based on the training data and the validating data; generating a first set of scores bas

Assignees

Intuit Inc

Inventors

Bosnjakovic Dusan

Classifications

G06N20/00Primary
Machine learning · CPC title
G06N5/02Primary
Knowledge representation; Symbolic representation · CPC title

Patent family

Related publications grouped by family.

View patent family 73746966

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10867249B1 cover?: Techniques are disclosed herein for determining variable importance on a predictive model on a case level. Modeling data associated with a case is received. The modeling data provides input variables, each having a corresponding value for input to a predictive modeling technique associated with the case. A measure of impact for each of the variables is determined using an input shuffling method…
Who is the assignee on this patent?: Intuit Inc
What technology area does this patent fall under?: Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 15 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).