Who is the assignee on this patent?

Advanced New Technologies Co Ltd

What technology area does this patent fall under?

Primary CPC classification G06Q20/4016. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 21 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

GBDT model feature interpretation method and apparatus

US11205129B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11205129-B2
Application number	US-202016889695-A
Country	US
Kind code	B2
Filing date	Jun 1, 2020
Priority date	May 21, 2018
Publication date	Dec 21, 2021
Grant date	Dec 21, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Implementations of the present specification disclose methods, devices, and apparatuses for determining a feature interpretation of a predicted label value of a user generated by a GBDT model. In one aspect, the method includes separately obtaining, from each of a predetermined quantity of decision trees ranked among top decision trees, a leaf node and a score of the leaf node; determining a respective prediction path of each leaf node; obtaining, for each parent node on each prediction path, a split feature and a score of the parent node; determining, for each child node on each prediction path, a feature corresponding to the child node and a local increment of the feature on the child node; obtaining a collection of features respectively corresponding to the child nodes; and obtaining a respective measure of relevance between the feature corresponding to the at least one child node and the predicted label value.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: generating a predicted label for a user using a gradient boosting decision tree GBDT model, the GBDT model comprising multiple decision trees arranged in a predetermined order; obtaining, from each of a predetermined quantity of decision trees ranked among top decision trees, (i) a leaf node corresponding to the predicted label generated for the user and (ii) a score of the leaf node that is determined by using the GBDT model; for each of the predetermined quantity of decision trees ranked among top decision trees: determining a prediction path of the leaf node obtained from the decision tree, wherein the prediction path includes the leaf node, a root node of the decision tree in which the leaf node is included, and any child node included therealong that is neither the leaf node nor the root node; computing, for the leaf node, a count of how many training samples for the GBDT model have the predicted label corresponding to the leaf node, wherein the count is dependent on a total quantity of training samples applied during a training process of the GBDT model; determining, based on the count of training samples, a weight for the score of the leaf node; determining an estimated score of a parent node of the leaf node based at least on multiplying the weight with the score of the leaf node; obtaining a split feature corresponding to the parent node of the leaf node; determining, for the leaf node and based on (i) the score of the leaf node, (ii) the estimated score of the parent node of the leaf node, and (iii) the split feature corresponding to the parent node of the leaf node, a feature corresponding to the leaf node and a local increment of the feature along the path from the parent node of the leaf node to the leaf node; whenever the parent node is not the root node of the decision tree in which the leaf node is included, using the parent node of the leaf node as a child node and determining, for the child node and based on (i) an estimated score of the child node, (ii) an estimated score of a parent node of the child node, and (iii) a split feature corresponding to the parent node of the child node, a feature corresponding to the child node and a local increment of the feature along the path from the parent node of the child node to the child node; and obtaining a collection of features respectively corresponding to the child nodes including the leaf node as the multiple features relevant to the predicted label value of the user; determining, from the collection of features that have been obtained for each of the predetermined quantity of decision trees, a set of child nodes that correspond to a same, first feature; and determining, by computing a sum of the local increments of the first feature, a measure of relevance between the first feature and the predicted label for the user. 2. The computer-implemented method of claim 1 , further comprising determining the estimated score of the parent node of the leaf node based on: determining an average value of respective scores of two child nodes of the parent node. 3. The computer-implemented method of claim 1 , wherein determining the local increment of the feature along the path from the parent node of the leaf node to the leaf node comprises: determining a difference between the score of the leaf node and the estimated score of the parent node of the leaf node; and using the difference as the value of the local increment of the feature corresponding to the parent node of the leaf node. 4. The computer-implemented method of claim 1 , wherein the GBDT model is a classification model or a regression model. 5. A non-transitory, computer-readable medium storing one or more instructions executable by a computer system to perform one or more operations comprising: generating a predicted label for a user using a gradient boosting decision tree GBDT model, the GBDT model comprising multiple decision trees arranged in a predetermined order; obtaining, from each of a predetermined quantity of decision trees ranked among top decision trees, (i) a leaf node corresponding to the predicted label generated for the user and (ii) a score of the leaf node that is determined by using the GBDT model; for each of the predetermined quantity of decision trees ranked among top decision trees: determining a prediction path of the leaf node obtained from the decision tree, wherein the prediction path includes the leaf node, a root node of the decision tree in which the leaf node is included, and any child node included therealong that is neither the leaf node nor the root node; computing, for the leaf node, a count of how many training samples for the GBDT model have the predicted label corresponding to the leaf node, wherein the count is dependent on a total quantity of training samples applied during a training process of the GBDT model; determining, based on the count of training samples, a weight for the score of the leaf node; determining an estimated score of a parent node of the leaf node based at least on multiplying the weight with the score of the leaf node; obtaining a split feature corresponding to the parent node of the leaf node; determining, for the leaf node and based on (i) the score of the leaf node, (ii) the estimated score of the parent node of the leaf node, and (iii) the split feature corresponding to the parent node of the leaf node, a feature corresponding to the leaf node and a local increment of the feature along the path from the parent node of the leaf node to the leaf node; whenever the parent node is not the root node of the decision tree in which the leaf node is included, using the parent node of the leaf node as a child node and determining, for the child node and based on (i) an estimated score of the child node, (ii) an estimated score of a parent node of the child node, and (iii) a split feature corresponding to the parent node of the child node, a feature corresponding to the child node and a local increment of the feature along the path from the parent node of the child node to the child node; and obtaining a collection of features respectively corresponding to the child nodes including the leaf node as the multiple features relevant to the predicted label value of the user; determining, from the collection of features that have been obtained for each of the predetermined quantity of decision trees, a set of child nodes that correspond to a same, first feature; and determining, by computing a sum of the local increments of the first feature, a measure of relevance between the first feature and the predicted label for the user. 6. The non-transitory, computer-readable medium of claim 5 , further comprising determining the estimated score of the parent node of the leaf node based on: determining an average value of respective scores of two child nodes of the parent node. 7. The non-transitory, computer-readable medium of claim 5 , wherein determining the local increment of the feature along the path from the parent node of the leaf node to the leaf node comprises: determining a difference between the score of the leaf node and the estimated score of the parent node of the leaf node; and using the difference as the value of the local increment of the feature corresponding to the parent node of the leaf node. 8. The non-transitory, computer-readable medium of claim 5 , wherein the GBDT model is a classification model or a regression model. 9. A computer-implemented system, comprising: one or more computers; and one or more computer memory devices interoperably coupled with the one or more computers and having tangible, non-transitory, machine-readable media storing one or mor

Assignees

Advanced New Technologies Co Ltd

Inventors

Classifications

G06Q20/4016Primary
involving fraud or risk level assessment in transaction processing · CPC title
G06F18/2148
characterised by the process organisation or structure, e.g. boosting cascade · CPC title
G06N5/01
Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title
G06F18/2193
based on specific statistical tests · CPC title
G06F18/24323Primary
Tree-organised classifiers · CPC title

Patent family

Related publications grouped by family.

View patent family 63806940

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11205129B2 cover?: Implementations of the present specification disclose methods, devices, and apparatuses for determining a feature interpretation of a predicted label value of a user generated by a GBDT model. In one aspect, the method includes separately obtaining, from each of a predetermined quantity of decision trees ranked among top decision trees, a leaf node and a score of the leaf node; determining a re…
Who is the assignee on this patent?: Advanced New Technologies Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06Q20/4016. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 21 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).