System and method for prediction using synthetic features and gradient boosted decision tree

US2017213280A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2017213280-A1
Application numberUS-201615007593-A
CountryUS
Kind codeA1
Filing dateJan 27, 2016
Priority dateJan 27, 2016
Publication dateJul 27, 2017
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A machine learning system and method are disclosed in which a plurality of synthetic features are created from input data, and a gradient boosted decision tree algorithm is then executed by the computer to process both the synthetic features and at least some of the input data to produce an output that is a probability.

First claim

Opening claim text (preview).

1 . A computer-implemented method comprising: the computer obtaining a set of data relating to a loan application; the computer determining a plurality of synthetic features by at least: executing a plurality of machine learning algorithms, each of the machine learning algorithms, when executed, receiving as an input at least some of the data and producing as an output a respective synthetic feature representing an initial probability of whether a loan default will occur; the computer executing a gradient boosted decision tree (GBDT) algorithm to process both the synthetic features and at least some of the data to produce an output representing a final probability of whether the loan default will occur. 2 . The computer-implemented method of claim 1 , further comprising: the computer generating an indication of whether or not to approve the loan based on whether a particular value is above or below a stored threshold; wherein the particular value is based on the final probability. 3 . The computer-implemented method of claim 1 , wherein the plurality of machine learning algorithms are a first set of machine learning algorithms, and wherein determining the plurality of synthetic features further comprises: the computer executing at least one other machine learning algorithm different from the first set of machine learning algorithms, the at least one other machine learning algorithm, when executed, receiving an input based on at least some of the data and producing a plurality of outputs; wherein each one of the plurality of outputs is a synthetic feature representing a probability of an event occurring, the event being different from the loan default that is associated with the final probability. 4 . The computer-implemented method of claim 3 , wherein the event comprises late payment of the loan. 5 . The computer-implemented method of claim 1 , further comprising: the computer performing binary encoding of at least some of the data to produce binary encoded data; and the computer inputting the binary encoded data to at least one of the machine learning algorithms. 6 . The computer-implemented method of claim 1 , further comprising: the computer augmenting the data with at least one additional feature; and the computer inputting the at least one additional feature to at least one of the machine learning algorithms. 7 . The computer-implemented method of claim 1 , wherein the data includes an amount of loan requested and a loan duration, and wherein at least two of the machine learning algorithms are different from each other. 8 . The computer-implemented method of claim 7 , wherein the data further includes a transaction history of the loan applicant, and wherein one of the machine learning algorithms is a neural network that accepts the transaction history as an input. 9 . The computer-implemented method of claim 1 , further comprising: the computer training the machine learning algorithms, the training including using training data and test data to determine what inputs are to be used for each machine learning algorithm by: the computer trying different possible inputs and selecting a set of one or more inputs that best satisfy a metric. 10 . A system comprising: a memory to store a set of data relating to a loan application; a predictor to receive the data and to produce an output representing a final probability of whether a loan default will occur; the predictor including a plurality of learners, each learner implementing a respective machine learning algorithm; the predictor configured to: determine a plurality of synthetic features by sending to each of the learners at least some of the data, and each of the learners outputting a respective synthetic feature representing an initial probability of whether the loan default will occur; and execute a gradient boosted decision tree (GBDT) algorithm to process both the synthetic features and at least some of the data to produce the output representing the final probability of whether the loan default will occur. 11 . The system of claim 10 , wherein the system is configured to generate an indication of whether or not to approve the loan based on whether a particular value is above or below a stored threshold, wherein the particular value is based on the final probability. 12 . The system of claim 10 , wherein the plurality of learners are a first set of learners, and wherein the predictor is configured to determine the plurality of synthetic features by also: sending an input to at least one other learner different from the first set of learners, the input based on at least some of the data, and the at least one other learner implementing a machine learning algorithm that, when executed, receives the input and produces a plurality of outputs; wherein each one of the plurality of outputs is a synthetic feature representing a probability of an event occurring, the event being different from the loan default that is associated with the final probability. 13 . The system of claim 12 , wherein the event comprises late payment of the loan. 14 . The system of claim 10 , wherein the predictor is further configured to: perform binary encoding of at least some of the data to produce binary encoded data; and send the binary encoded data to at least one of the learners. 15 . The system of claim 10 , wherein the predictor is further configured to: augment the data with at least one additional feature; and send the at least one additional feature to at least one of the learners. 16 . The system of claim 10 , wherein the data includes an amount of loan requested and a loan duration, and wherein at least two of the learners implement machine learning algorithms that are different from each other. 17 . The system of claim 16 , wherein the data further includes a transaction history of the loan applicant, and wherein one of the learners is a neural network that accepts the transaction history as an input. 18 . The system of claim 10 , wherein the system is configured to train the learners, the training comprising using training data and test data to determine what inputs are to be sent to each learner by: the system trying different possible inputs and selecting a set of one or more inputs that best satisfy a metric. 19 . A system comprising: at least one processor; and memory having stored thereon processor-executable instructions that, when executed, cause the at least one processor to: determine a plurality of synthetic features by at least: executing a plurality of machine learning algorithms, each of the machine learning algorithms, when executed, receiving as an input at least some of the data and producing as an output a respective synthetic feature representing an initial probability of whether a loan default will occur, and execute a gradient boosted decision tree (GBDT) algorithm to process both the synthetic features and at least some of the data to produce an output representing a final probability of whether the loan default will occur. 20 . The system of claim 19 , wherein the processor-executable instructions, when executed, further cause the at least one processor to: generate an indication of whether or not to approve the loan based on whether a particular value is above or below a stored threshold; wherein the particular value is based on the final probability.

Assignees

Inventors

Classifications

  • G06Q40/03Primary

    Credit; Loans; Processing thereof · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • G06N3/02Primary

    Neural networks · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • G06Q40/025Primary

    Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017213280A1 cover?
A machine learning system and method are disclosed in which a plurality of synthetic features are created from input data, and a gradient boosted decision tree algorithm is then executed by the computer to process both the synthetic features and at least some of the input data to produce an output that is a probability.
Who is the assignee on this patent?
Kaznady Max S, Huawei Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06Q40/03. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jul 27 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).