GBDT model feature interpretation method and apparatus

US11205129B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11205129-B2
Application numberUS-202016889695-A
CountryUS
Kind codeB2
Filing dateJun 1, 2020
Priority dateMay 21, 2018
Publication dateDec 21, 2021
Grant dateDec 21, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Implementations of the present specification disclose methods, devices, and apparatuses for determining a feature interpretation of a predicted label value of a user generated by a GBDT model. In one aspect, the method includes separately obtaining, from each of a predetermined quantity of decision trees ranked among top decision trees, a leaf node and a score of the leaf node; determining a respective prediction path of each leaf node; obtaining, for each parent node on each prediction path, a split feature and a score of the parent node; determining, for each child node on each prediction path, a feature corresponding to the child node and a local increment of the feature on the child node; obtaining a collection of features respectively corresponding to the child nodes; and obtaining a respective measure of relevance between the feature corresponding to the at least one child node and the predicted label value.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: generating a predicted label for a user using a gradient boosting decision tree GBDT model, the GBDT model comprising multiple decision trees arranged in a predetermined order; obtaining, from each of a predetermined quantity of decision trees ranked among top decision trees, (i) a leaf node corresponding to the predicted label generated for the user and (ii) a score of the leaf node that is determined by using the GBDT model; for each of the predetermined quantity of decision trees ranked among top decision trees: determining a prediction path of the leaf node obtained from the decision tree, wherein the prediction path includes the leaf node, a root node of the decision tree in which the leaf node is included, and any child node included therealong that is neither the leaf node nor the root node; computing, for the leaf node, a count of how many training samples for the GBDT model have the predicted label corresponding to the leaf node, wherein the count is dependent on a total quantity of training samples applied during a training process of the GBDT model; determining, based on the count of training samples, a weight for the score of the leaf node; determining an estimated score of a parent node of the leaf node based at least on multiplying the weight with the score of the leaf node; obtaining a split feature corresponding to the parent node of the leaf node; determining, for the leaf node and based on (i) the score of the leaf node, (ii) the estimated score of the parent node of the leaf node, and (iii) the split feature corresponding to the parent node of the leaf node, a feature corresponding to the leaf node and a local increment of the feature along the path from the parent node of the leaf node to the leaf node; whenever the parent node is not the root node of the decision tree in which the leaf node is included, using the parent node of the leaf node as a child node and determining, for the child node and based on (i) an estimated score of the child node, (ii) an estimated score of a parent node of the child node, and (iii) a split feature corresponding to the parent node of the child node, a feature corresponding to the child node and a local increment of the feature along the path from the parent node of the child node to the child node; and obtaining a collection of features respectively corresponding to the child nodes including the leaf node as the multiple features relevant to the predicted label value of the user; determining, from the collection of features that have been obtained for each of the predetermined quantity of decision trees, a set of child nodes that correspond to a same, first feature; and determining, by computing a sum of the local increments of the first feature, a measure of relevance between the first feature and the predicted label for the user. 2. The computer-implemented method of claim 1 , further comprising determining the estimated score of the parent node of the leaf node based on: determining an average value of respective scores of two child nodes of the parent node. 3. The computer-implemented method of claim 1 , wherein determining the local increment of the feature along the path from the parent node of the leaf node to the leaf node comprises: determining a difference between the score of the leaf node and the estimated score of the parent node of the leaf node; and using the difference as the value of the local increment of the feature corresponding to the parent node of the leaf node. 4. The computer-implemented method of claim 1 , wherein the GBDT model is a classification model or a regression model. 5. A non-transitory, computer-readable medium storing one or more instructions executable by a computer system to perform one or more operations comprising: generating a predicted label for a user using a gradient boosting decision tree GBDT model, the GBDT model comprising multiple decision trees arranged in a predetermined order; obtaining, from each of a predetermined quantity of decision trees ranked among top decision trees, (i) a leaf node corresponding to the predicted label generated for the user and (ii) a score of the leaf node that is determined by using the GBDT model; for each of the predetermined quantity of decision trees ranked among top decision trees: determining a prediction path of the leaf node obtained from the decision tree, wherein the prediction path includes the leaf node, a root node of the decision tree in which the leaf node is included, and any child node included therealong that is neither the leaf node nor the root node; computing, for the leaf node, a count of how many training samples for the GBDT model have the predicted label corresponding to the leaf node, wherein the count is dependent on a total quantity of training samples applied during a training process of the GBDT model; determining, based on the count of training samples, a weight for the score of the leaf node; determining an estimated score of a parent node of the leaf node based at least on multiplying the weight with the score of the leaf node; obtaining a split feature corresponding to the parent node of the leaf node; determining, for the leaf node and based on (i) the score of the leaf node, (ii) the estimated score of the parent node of the leaf node, and (iii) the split feature corresponding to the parent node of the leaf node, a feature corresponding to the leaf node and a local increment of the feature along the path from the parent node of the leaf node to the leaf node; whenever the parent node is not the root node of the decision tree in which the leaf node is included, using the parent node of the leaf node as a child node and determining, for the child node and based on (i) an estimated score of the child node, (ii) an estimated score of a parent node of the child node, and (iii) a split feature corresponding to the parent node of the child node, a feature corresponding to the child node and a local increment of the feature along the path from the parent node of the child node to the child node; and obtaining a collection of features respectively corresponding to the child nodes including the leaf node as the multiple features relevant to the predicted label value of the user; determining, from the collection of features that have been obtained for each of the predetermined quantity of decision trees, a set of child nodes that correspond to a same, first feature; and determining, by computing a sum of the local increments of the first feature, a measure of relevance between the first feature and the predicted label for the user. 6. The non-transitory, computer-readable medium of claim 5 , further comprising determining the estimated score of the parent node of the leaf node based on: determining an average value of respective scores of two child nodes of the parent node. 7. The non-transitory, computer-readable medium of claim 5 , wherein determining the local increment of the feature along the path from the parent node of the leaf node to the leaf node comprises: determining a difference between the score of the leaf node and the estimated score of the parent node of the leaf node; and using the difference as the value of the local increment of the feature corresponding to the parent node of the leaf node. 8. The non-transitory, computer-readable medium of claim 5 , wherein the GBDT model is a classification model or a regression model. 9. A computer-implemented system, comprising: one or more computers; and one or more computer memory devices interoperably coupled with the one or more computers and having tangible, non-transitory, machine-readable media storing one or mor

Assignees

Inventors

Classifications

  • involving fraud or risk level assessment in transaction processing · CPC title

  • characterised by the process organisation or structure, e.g. boosting cascade · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • based on specific statistical tests · CPC title

  • Tree-organised classifiers · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11205129B2 cover?
Implementations of the present specification disclose methods, devices, and apparatuses for determining a feature interpretation of a predicted label value of a user generated by a GBDT model. In one aspect, the method includes separately obtaining, from each of a predetermined quantity of decision trees ranked among top decision trees, a leaf node and a score of the leaf node; determining a re…
Who is the assignee on this patent?
Advanced New Technologies Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06Q20/4016. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 21 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).