What technology area does this patent fall under?

Primary CPC classification G06N5/01. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Jan 20 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Methods and systems for training a decision-tree based machine learning algorithm (mla)

US2022019902A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2022019902-A1
Application number	US-202117207403-A
Country	US
Kind code	A1
Filing date	Mar 19, 2021
Priority date	Jul 20, 2020
Publication date	Jan 20, 2022
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and servers for of training a decision-tree based Machine Learning Algorithm (MLA) are disclosed. During a given training iteration, the method includes generating prediction values using current generated trees, generating estimated gradient values by applying a loss function, generating a first plurality of noisy estimated gradient values based on the estimated gradient values, generating a plurality of noisy candidate trees using the first plurality of noisy estimated gradient values, applying a selection metric to select a target tree amongst the plurality of noisy candidate trees, generating a second plurality of noisy estimated gradient values based on the plurality of estimated gradient values, generating an iteration-specific tree based on the target tree and the second plurality of noisy estimated gradient values, and storing, the iteration-specific tree to be used in combination with the current generated trees.

First claim

Opening claim text (preview).

1 . A method of training a decision-tree based Machine Learning Algorithm (MLA), the decision-tree based MLA being based on a plurality of generated trees, a given one from the plurality of generated trees having a respective plurality of leaf nodes having respective leaf values, the method executable by a server having access to a training dataset, the training dataset comprising a plurality of training objects and a plurality of target values, a given one of the plurality of target values being indicative of a ground-truth associated with a given one of the plurality of training objects, the method comprising: during a given training iteration of the decision-tree based MLA: generating, by the server, a plurality of prediction values using a current plurality of generated trees, a given one from the plurality of prediction values being indicative of an output generated by the current plurality of generated trees for the target value of the respective one of the plurality of training objects; generating, by the server, a plurality of estimated gradient values by applying a loss function, a given one from the plurality of estimated gradient values being indicative of a difference between the target value and the respective one of the plurality of prediction values; generating, by the server, a first plurality of noisy estimated gradient values by applying a first noise-injecting function onto the plurality of estimated gradient values; generating, by the server, a plurality of noisy candidate trees using the first plurality of noisy estimated gradient values, a given one of the plurality of noisy candidate trees having a respective plurality of leaf nodes with respective leaf values, the respective leaf values being based on the first plurality of noisy estimated gradient values, applying, by the server, a selection metric to the leaf values of the plurality of noisy candidate trees to select a target tree amongst the plurality of noisy candidate trees, the target tree being a given noisy candidate tree having leaf values that are closest to respective noisy estimated gradient values from the first plurality of noisy estimated gradient values than leaf values of any other noisy candidate tree; generating, by the server, a second plurality of noisy estimated gradient values by applying a second noise-injecting function onto the plurality of estimated gradient values; generating, by the server, an iteration-specific tree by determining new leaf values for respective leaf nodes of the target tree, the new leaf values being determined based on the second plurality of noisy estimated gradient values; and storing, by the server, the iteration-specific tree to be used, by the server, in combination with the current plurality of generated trees. 2 . The method of claim 1 , wherein the decision-tree based MLA is being trained for performing one of a regression task and a classification task during an in-use phase of the decision-tree based MLA. 3 . The method of claim 1 , wherein the loss function if one of a convex loss function and a non-convex loss function. 4 . The method of claim 3 , wherein the non-convex loss function includes a 0-1 loss function. 5 . The method of claim 1 , wherein the first noise-inducing function is: ζ i = (0,2ϵβ −1 ) wherein: ζ i is a given noise value to be added to i th estimated gradient value from the plurality of estimated gradient values to generate the i th noisy estimated gradient value from the first plurality of noisy estimated gradient values, (“mean”, “variance”) is a normal distribution function, ϵ is a learning rate parameter, β is an inverse Langevin diffusion temperature parameter. 6 . The method of claim 1 , wherein the generating the plurality of noisy candidate trees comprises: generating, by the server, the given one of the plurality of noisy candidate trees by: determining, by the server, a tree structure for the given one of the plurality of noisy candidate trees based on the first plurality of noisy estimated gradient values, the tree structure having the respective plurality of leaf nodes each of which corresponds to at least one of the first plurality of noisy estimated gradient values; for a given leaf node corresponding to more than one of the first plurality of noisy estimated gradient values, aggregating, by the server, the more than one of the first plurality of noisy estimated gradient values, thereby determining an aggregated value; using the aggregated value as the respective leaf value of the given leaf node; and for an other given leaf node of the respective plurality of leaf nodes corresponding to only one of the first plurality of noisy estimated gradient values, using the only one of the first plurality of noisy estimated gradient values as the respective leaf value. 7 . The method of claim 1 , wherein the selection metric is a sum of squared differences between the leaf values of a given noisy candidate tree and the corresponding estimated gradient values from the first plurality of estimated gradient values. 8 . The method of claim 1 , wherein the second noise-inducing function is: ζ i ′= (0,2ϵβ −1 ) wherein: ζ i ′ is a given noise value to be added to i th estimated gradient value from the plurality of estimated gradient values to generate the i th noisy estimated gradient value from the second plurality of noisy estimated gradient values, (“mean”, “variance”) is a normal distribution function, ϵ is a learning rate parameter, β is an inverse Langevin diffusion temperature parameter. 9 . The method of claim 1 , wherein the first noise-inducing function is independent from the second noise-inducing function. 10 . The method of claim 1 , wherein the first plurality of noisy estimated gradient values are generated independently from the second plurality of noisy estimated gradient values. 11 . The method of claim 1 , wherein the generating the iteration-specific tree comprises: for a given leaf node of the target tree corresponding to more than one of the second plurality of noisy estimated gradient values, aggregating, by the server, the more than one of the second plurality of noisy estimated gradient values, thereby determining an aggregated value; using the aggregated value as the new leaf value of the given leaf node; and for an other given leaf node of the target tree corresponding to only one of the second plurality of noisy estimated gradient values, using the only one of the second plurality of noisy estimated gradient values to be used as the new leaf value of the other given leaf node. 12 . The method of claim 1 , wherein during the given training iteration, the method further comprises: executing, by the server, a shrinkage procedure on the plurality of prediction values by using a shrinkage parameter for generating a plurality of shrunk prediction values, a given shrunk prediction value being smaller than the respective prediction value; and the given one from the plurality of estimated gradient values being indicative of a difference between the target value and a respective one of the plurality of shrunk prediction values. 13 . The method of claim 1 , wherein the using the iteration-specific tree in combination with the current plurality of generated trees comprises: executing, by the server, a regularization procedure on the new leaf values of the iteration-specific tree by using a learning rate parameter for generating a plurality of regularized leaf values, a given regularized leaf value being smaller than the respective leaf value; and using, by the server, the iteration-specific tree with the pl

Assignees

Yandex Europe Ag

Inventors

Ustimenko Aleksey Ivanovich

Classifications

G06N5/01Primary
Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title
G06F18/2148
characterised by the process organisation or structure, e.g. boosting cascade · CPC title
G06F18/24323
Tree-organised classifiers · CPC title
G06N20/20Primary
Ensemble learning · CPC title
G06N20/00
Machine learning · CPC title

Patent family

Related publications grouped by family.

View patent family 79292651

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022019902A1 cover?: Methods and servers for of training a decision-tree based Machine Learning Algorithm (MLA) are disclosed. During a given training iteration, the method includes generating prediction values using current generated trees, generating estimated gradient values by applying a loss function, generating a first plurality of noisy estimated gradient values based on the estimated gradient values, genera…
Who is the assignee on this patent?: Yandex Europe Ag
What technology area does this patent fall under?: Primary CPC classification G06N5/01. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Jan 20 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Systems and methods for noise-robust contrastive learning

Stochastic gradient boosting for deep neural networks

Method and system for model integration in ensemble learning

Candidate geographic coordinate ranking

Frequently asked questions