Training tree-based machine-learning modeling algorithms for predicting outputs and generating explanatory data

US12299552B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12299552-B2
Application numberUS-202117182053-A
CountryUS
Kind codeB2
Filing dateFeb 22, 2021
Priority dateOct 30, 2017
Publication dateMay 13, 2025
Grant dateMay 13, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Certain aspects involve training tree-based machine-learning models for computing predicted responses and generating explanatory data for the models. For example, independent variables having relationships with a response variable are identified. Each independent variable corresponds to an action or observation for an entity. The response variable has outcome values associated with the entity. Splitting rules are used to generate the tree-based model, which includes decision trees for determining relationships between independent variables and a predicted response associated with the response variable. The tree-based model is iteratively adjusted to enforce monotonicity with respect to representative response values of the terminal nodes. For instance, one or more decision trees are adjusted such that one or more representative response values are modified and a monotonic relationship exists between each independent variable and the response variable. The adjusted model is used to output explanatory data indicating relationships between independent variable changes and response variable changes.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: generating a tree-based machine-learning model that is a memory structure comprising interconnected parent nodes and terminal nodes, wherein each parent node includes a respective splitting variable that causes the parent node to be connected via links to a respective pair of child nodes, wherein the terminal nodes includes respective representative response values based on values of the splitting variables, wherein generating the tree-based machine-learning model comprises: determining a splitting rule for partitioning data samples in a decision tree; partitioning, based on the splitting rule, data samples into a first tree region and a second tree region; computing a first representative response value from the data samples in first tree region and a second representative response value from the data samples in second tree region, wherein a monotonicity constraint for the decision tree is violated by a set of representative response values including the first and second representative response values, a representative response value for a closest lower neighboring region of the first tree region, and a representative response value for a closest upper neighboring region of the second tree region; applying a modification to the decision tree by modifying the splitting rule and the set of representative response values to enforce the monotonicity constraint; computing, with the decision tree having the modification, a first modified representative response value from data samples partitioned into the first tree region and a second modified representative response value from data samples partitioned into the second tree region, wherein: the monotonicity constraint is satisfied by a modified set of representative response values, and the modified set of representative response values includes the first and second modified representative response values, the representative response value for the closest lower neighboring region of the first tree region, and the representative response value for the closest upper neighboring region of the second tree region; and outputting the decision tree having the modification; and computing, based on the decision tree as outputted, explanatory data indicating relationships between (i) changes in a response variable computed with the tree-based machine-learning model and (ii) changes in one or more independent variables represented by the tree-based machine-learning model, wherein the independent variables comprise a first independent variable and a second independent variable, and wherein generating the explanatory data for the first independent variable having an input value comprises: identifying a first optimizing value of the first independent variable and a second optimizing value of the second independent variable that cause the tree-based machine-learning model to output an optimal value for the response variable; computing an additional value for the response variable by applying the tree-based machine-learning model to the first independent variable having the input value and the second independent variable having the second optimizing value; computing a difference between the optimal value for the response variable and the additional value for the response variable; and outputting the explanatory data for the first independent variable based on the difference. 2. The method of claim 1 , wherein the relationships comprise respective contributions of the one or more independent variables to an output value of the response variable. 3. The method of claim 2 , wherein the output value of the response variable comprises a risk assessment. 4. The method of claim 1 , wherein generating the explanatory data for the second independent variable having an additional input value comprises: computing a third value for the response variable by applying the tree-based machine-learning model to the first independent variable having the first optimizing value and the second independent variable having the additional input value; computing an additional difference between the optimal value for the response variable and the third value for the response variable; and excluding, based on the difference being larger than the additional difference, an explanation of the second independent variable from the explanatory data. 5. The method of claim 4 , wherein output values of the response variable comprise risk assessments, respectively. 6. The method of claim 1 , wherein the independent variables comprise a first independent variable having a first input value and a second independent variable having a second input value, wherein generating the explanatory data for the first independent variable having the first input value comprises: identifying a first optimizing value of the first independent variable and a second optimizing value of the second independent variable that cause the tree-based machine-learning model to output an optimal value for the response variable; computing a first output value for the response variable by applying the tree-based machine-learning model to the first independent variable having the first optimizing value and the second independent variable having the second input value; identifying a second output value for the response variable that is computed by applying the tree-based machine-learning model to the first independent variable having the first input value and the second independent variable having the second input value; computing a difference between the first output value and the second output value; and outputting the explanatory data for the first independent variable based on the difference. 7. The method of claim 6 , wherein generating the explanatory data for the second independent variable having the second input value comprises: computing a third output value for the response variable by applying the tree-based machine-learning model to the first independent variable having the first input value and the second independent variable having the second optimizing value; computing an additional difference between the first output value and the third output value; and excluding, based on the difference being larger than the additional difference, an explanation of the second independent variable from the explanatory data. 8. The method of claim 7 , wherein output values of the response variable comprise risk assessments, respectively. 9. A system comprising: a processor; and a non-transitory computer-readable medium comprising program code stored thereon, wherein the program code is executable by the processor to cause the processor to perform operations comprising: generating a tree-based machine-learning model that is a memory structure comprising interconnected parent nodes and terminal nodes, wherein each parent node includes a respective splitting variable that causes the parent node to be connected via links to a respective pair of child nodes, wherein the terminal nodes includes respective representative response values based on values of the splitting variables, wherein generating the tree-based machine-learning model comprises: determining a splitting rule for partitioning data samples in a decision tree; partitioning, based on the splitting rule, data samples into a first tree region and a second tree region; computing a first representative response value from the data samples in first tree region and a second representative response value from the data samples in second tree region, wherein a monotonicity constraint for the decision tree is violated by a set of representative response values including the first and second representative response values, a representative response value for

Assignees

Inventors

Classifications

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • Inference or reasoning models · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Explanation of inference; Explainable artificial intelligence [XAI]; Interpretable artificial intelligence · CPC title

  • G06N20/20Primary

    Ensemble learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12299552B2 cover?
Certain aspects involve training tree-based machine-learning models for computing predicted responses and generating explanatory data for the models. For example, independent variables having relationships with a response variable are identified. Each independent variable corresponds to an action or observation for an entity. The response variable has outcome values associated with the entity. …
Who is the assignee on this patent?
Equifax Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 13 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).