Systems and methods for determining relative importance of one or more variables in a nonparametric machine learning model

US11580426B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11580426-B2
Application numberUS-202017065794-A
CountryUS
Kind codeB2
Filing dateOct 8, 2020
Priority dateDec 17, 2019
Publication dateFeb 14, 2023
Grant dateFeb 14, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for determining relative importance of one or more variables in a non-parametric model include: receiving, raw values of the variables corresponding to one or more entities; processing the raw values using a statistical model to obtain probability values for the variables and an overall prediction value for each entity; determining a plurality of cumulative distributions for the variables based on the raw values and the number of entities having a specific raw value; grouping the variables into a plurality of equally sized buckets based on the cumulative distributions; determining a mean probability value for each bucket; assigning a rank number for each bucket based on the mean probability values; compiling a table for the entities based on the raw values and the buckets corresponding to the raw values; and determining the relative importance of the variables for the entities based on the rank numbers.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for determining a number of variables in a non-parametric model that influence an outcome of a loan application, the system comprising: at least one database for storing logit values for a plurality of variables for a plurality of entities; at least one non-transitory computer-readable medium configured to store instructions; and at least one processor configured to execute the instructions to perform operations comprising: receiving, from the at least one database, the logit values of the plurality of variables corresponding to the plurality of entities; processing the received logit values of the plurality of variables for the plurality of entities using the non-parametric model to obtain logit prediction values for the plurality of variables; determining a plurality of cumulative distributions for the plurality of variables based on the respective logit prediction values and a number of other entities of the plurality of entities having a specific logit prediction value; grouping the variables into a plurality of equally sized buckets based on the cumulative distributions; determining a logit prediction value for each bucket; assigning a rank number for each bucket based on their respective logit prediction values; compiling a rank of score averages (ROSA) table for the plurality of entities based on the logit prediction values and the buckets corresponding to the respective logit prediction values; and declining the loan application based a number of variables in the non-parametric model that influenced the outcome of the declined loan application based on the number of variables in the non-parametric model with the highest rank numbers and based on the ROSA table, wherein the non-parametric model is at least one of maximum likelihood (ML), k-nearest neighbors (kNN), or support vector machines (SVM). 2. The system of claim 1 , wherein the at least one processor configured to execute the instructions to perform operations further comprising identifying a definition for each of the plurality of variables; and generating a report based on the identified definitions, wherein the report includes the number of variables with the highest rank numbers. 3. The system of claim 1 , wherein the logit prediction values are at least one of an aggregate, mean, or median logit prediction value. 4. The system of claim 1 , wherein the logit prediction values indicate a first degree of risk for a corresponding loan application based on a specific variable indicating a second degree of risk for the corresponding loan application based on the plurality of variables. 5. The system of claim 1 , wherein a larger logit prediction value corresponds to a higher risk for the corresponding loan application and a higher rank number. 6. The system of claim 1 , wherein there are more than one hundred different variables. 7. The system of claim 1 , wherein the plurality of entities are simulated entities. 8. The system of claim 1 , wherein each of the buckets corresponds to at least 10% of the plurality of entities. 9. The system of claim 1 , wherein the at least one processor configured to execute the instructions to perform operations further comprising identifying one or more reasons associated with the plurality of variables. 10. The system of claim 9 , wherein the at least one processor configured to execute the instructions to perform operations further comprising generating a report based on the reasons. 11. The system of claim 1 , wherein the received logit values comprise historical data accrued over a predetermined window of time. 12. The system of claim 1 , wherein the system further comprises more than one database, and wherein the operations further comprise: receiving the logit values from each database; and processing the received logit values from each database separately using the non-parametric model to determine the logit prediction values for the received logit values received from each database. 13. The system of claim 1 , wherein assigning the rank numbers for the buckets includes: sorting the buckets in sequence based on the logit prediction values; assigning the rank numbers in sequence based on the sequence of the buckets. 14. A method for determining a predetermined number of variables in a non-parametric model that influence an outcome of one or more loan applications, the method comprising: receiving, from a database, logit values for a plurality of variables corresponding to a plurality of entities; processing the received logit values of the plurality of variables for the plurality of entities using the non-parametric model to obtain logit prediction values for each entity; determining a plurality of cumulative distributions for the variables based on the logit prediction values and the number of entities having a specific logit prediction value; binning the variables into a plurality of equally sized buckets based on the cumulative distributions; determining a logit prediction value for each respective bucket; assigning a rank number for each bucket based on their respective logit prediction values; compiling a rank of score averages (ROSA) table for the plurality of entities based on the logit prediction values and the buckets corresponding to their respective logit prediction values; and declining the loan application based a number of variables in the non-parametric model that influenced the outcome of the declined loan application based on the number of variables in the non-parametric model with the highest rank numbers and based on the ROSA table, wherein the non-parametric model is at least one of maximum likelihood (ML), k-nearest neighbors (kNN), or support vector machines (SVM). 15. The method of claim 14 , further compiling identifying a definition for each of the plurality of variables; and generating a report based on the identified definitions, wherein the report includes the plurality of variables with the highest rank numbers. 16. The method of claim 14 , wherein the received logit values comprise historical data accrued over a predetermined window of time. 17. The method of claim 14 , wherein each of the buckets corresponds to at least 10% of the plurality of entities. 18. A system for identifying a number of variables that influence an outcome of one or more loan applications, the system comprising: at least one database for storing logit values for one or more variables corresponding to a plurality of individuals; at least one non-transitory computer-readable medium configured to store instructions; and at least one processor configured to execute the instructions to perform operations comprising: receiving, from the database, the logit values of the variables corresponding to one or more applicants among the plurality of individuals; processing the received logit values of the variables for the applicants using a non-parametric model to obtain logit prediction values, the logit prediction values indicating a first degree of risk for a corresponding loan application based on specific variable indicating a second degree of risk for the corresponding loan application based on the variables; binning the variables into a plurality of equally sized buckets; determining a rank number for each bucket based on the logit prediction values of the variables associated with each bucket; compiling a rank of score averages (ROSA) table for the plurality of individuals based on the logit prediction values and the buckets corresponding to their respective logit predicti

Assignees

Inventors

Classifications

  • G06N20/10Primary

    using kernel methods, e.g. support vector machines [SVM] · CPC title

  • G06N5/04Primary

    Inference or reasoning models · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11580426B2 cover?
Systems and methods for determining relative importance of one or more variables in a non-parametric model include: receiving, raw values of the variables corresponding to one or more entities; processing the raw values using a statistical model to obtain probability values for the variables and an overall prediction value for each entity; determining a plurality of cumulative distributions for…
Who is the assignee on this patent?
Capital One Services Llc
What technology area does this patent fall under?
Primary CPC classification G06N20/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 14 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).