Methods and systems for training a decision-tree based machine learning algorithm (mla)

US2022019902A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2022019902-A1
Application numberUS-202117207403-A
CountryUS
Kind codeA1
Filing dateMar 19, 2021
Priority dateJul 20, 2020
Publication dateJan 20, 2022
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and servers for of training a decision-tree based Machine Learning Algorithm (MLA) are disclosed. During a given training iteration, the method includes generating prediction values using current generated trees, generating estimated gradient values by applying a loss function, generating a first plurality of noisy estimated gradient values based on the estimated gradient values, generating a plurality of noisy candidate trees using the first plurality of noisy estimated gradient values, applying a selection metric to select a target tree amongst the plurality of noisy candidate trees, generating a second plurality of noisy estimated gradient values based on the plurality of estimated gradient values, generating an iteration-specific tree based on the target tree and the second plurality of noisy estimated gradient values, and storing, the iteration-specific tree to be used in combination with the current generated trees.

First claim

Opening claim text (preview).

1 . A method of training a decision-tree based Machine Learning Algorithm (MLA), the decision-tree based MLA being based on a plurality of generated trees, a given one from the plurality of generated trees having a respective plurality of leaf nodes having respective leaf values, the method executable by a server having access to a training dataset, the training dataset comprising a plurality of training objects and a plurality of target values, a given one of the plurality of target values being indicative of a ground-truth associated with a given one of the plurality of training objects, the method comprising: during a given training iteration of the decision-tree based MLA: generating, by the server, a plurality of prediction values using a current plurality of generated trees, a given one from the plurality of prediction values being indicative of an output generated by the current plurality of generated trees for the target value of the respective one of the plurality of training objects; generating, by the server, a plurality of estimated gradient values by applying a loss function, a given one from the plurality of estimated gradient values being indicative of a difference between the target value and the respective one of the plurality of prediction values; generating, by the server, a first plurality of noisy estimated gradient values by applying a first noise-injecting function onto the plurality of estimated gradient values; generating, by the server, a plurality of noisy candidate trees using the first plurality of noisy estimated gradient values, a given one of the plurality of noisy candidate trees having a respective plurality of leaf nodes with respective leaf values, the respective leaf values being based on the first plurality of noisy estimated gradient values, applying, by the server, a selection metric to the leaf values of the plurality of noisy candidate trees to select a target tree amongst the plurality of noisy candidate trees, the target tree being a given noisy candidate tree having leaf values that are closest to respective noisy estimated gradient values from the first plurality of noisy estimated gradient values than leaf values of any other noisy candidate tree; generating, by the server, a second plurality of noisy estimated gradient values by applying a second noise-injecting function onto the plurality of estimated gradient values; generating, by the server, an iteration-specific tree by determining new leaf values for respective leaf nodes of the target tree, the new leaf values being determined based on the second plurality of noisy estimated gradient values; and storing, by the server, the iteration-specific tree to be used, by the server, in combination with the current plurality of generated trees. 2 . The method of claim 1 , wherein the decision-tree based MLA is being trained for performing one of a regression task and a classification task during an in-use phase of the decision-tree based MLA. 3 . The method of claim 1 , wherein the loss function if one of a convex loss function and a non-convex loss function. 4 . The method of claim 3 , wherein the non-convex loss function includes a 0-1 loss function. 5 . The method of claim 1 , wherein the first noise-inducing function is: ζ i = (0,2ϵβ −1 ) wherein: ζ i is a given noise value to be added to i th estimated gradient value from the plurality of estimated gradient values to generate the i th noisy estimated gradient value from the first plurality of noisy estimated gradient values, (“mean”, “variance”) is a normal distribution function, ϵ is a learning rate parameter, β is an inverse Langevin diffusion temperature parameter. 6 . The method of claim 1 , wherein the generating the plurality of noisy candidate trees comprises: generating, by the server, the given one of the plurality of noisy candidate trees by: determining, by the server, a tree structure for the given one of the plurality of noisy candidate trees based on the first plurality of noisy estimated gradient values, the tree structure having the respective plurality of leaf nodes each of which corresponds to at least one of the first plurality of noisy estimated gradient values; for a given leaf node corresponding to more than one of the first plurality of noisy estimated gradient values, aggregating, by the server, the more than one of the first plurality of noisy estimated gradient values, thereby determining an aggregated value; using the aggregated value as the respective leaf value of the given leaf node; and for an other given leaf node of the respective plurality of leaf nodes corresponding to only one of the first plurality of noisy estimated gradient values, using the only one of the first plurality of noisy estimated gradient values as the respective leaf value. 7 . The method of claim 1 , wherein the selection metric is a sum of squared differences between the leaf values of a given noisy candidate tree and the corresponding estimated gradient values from the first plurality of estimated gradient values. 8 . The method of claim 1 , wherein the second noise-inducing function is: ζ i ′= (0,2ϵβ −1 ) wherein: ζ i ′ is a given noise value to be added to i th estimated gradient value from the plurality of estimated gradient values to generate the i th noisy estimated gradient value from the second plurality of noisy estimated gradient values, (“mean”, “variance”) is a normal distribution function, ϵ is a learning rate parameter, β is an inverse Langevin diffusion temperature parameter. 9 . The method of claim 1 , wherein the first noise-inducing function is independent from the second noise-inducing function. 10 . The method of claim 1 , wherein the first plurality of noisy estimated gradient values are generated independently from the second plurality of noisy estimated gradient values. 11 . The method of claim 1 , wherein the generating the iteration-specific tree comprises: for a given leaf node of the target tree corresponding to more than one of the second plurality of noisy estimated gradient values, aggregating, by the server, the more than one of the second plurality of noisy estimated gradient values, thereby determining an aggregated value; using the aggregated value as the new leaf value of the given leaf node; and for an other given leaf node of the target tree corresponding to only one of the second plurality of noisy estimated gradient values, using the only one of the second plurality of noisy estimated gradient values to be used as the new leaf value of the other given leaf node. 12 . The method of claim 1 , wherein during the given training iteration, the method further comprises: executing, by the server, a shrinkage procedure on the plurality of prediction values by using a shrinkage parameter for generating a plurality of shrunk prediction values, a given shrunk prediction value being smaller than the respective prediction value; and the given one from the plurality of estimated gradient values being indicative of a difference between the target value and a respective one of the plurality of shrunk prediction values. 13 . The method of claim 1 , wherein the using the iteration-specific tree in combination with the current plurality of generated trees comprises: executing, by the server, a regularization procedure on the new leaf values of the iteration-specific tree by using a learning rate parameter for generating a plurality of regularized leaf values, a given regularized leaf value being smaller than the respective leaf value; and using, by the server, the iteration-specific tree with the pl

Assignees

Inventors

Classifications

  • G06N5/01Primary

    Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • characterised by the process organisation or structure, e.g. boosting cascade · CPC title

  • Tree-organised classifiers · CPC title

  • G06N20/20Primary

    Ensemble learning · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022019902A1 cover?
Methods and servers for of training a decision-tree based Machine Learning Algorithm (MLA) are disclosed. During a given training iteration, the method includes generating prediction values using current generated trees, generating estimated gradient values by applying a loss function, generating a first plurality of noisy estimated gradient values based on the estimated gradient values, genera…
Who is the assignee on this patent?
Yandex Europe Ag
What technology area does this patent fall under?
Primary CPC classification G06N5/01. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 20 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).