What technology area does this patent fall under?

Primary CPC classification G06N3/08. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 25 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Learning with moment estimation using different time constants

US12020129B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12020129-B2
Application number	US-202318300007-A
Country	US
Kind code	B2
Filing date	Apr 13, 2023
Priority date	Feb 11, 2020
Publication date	Jun 25, 2024
Grant date	Jun 25, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A technique for training a model includes obtaining a training example for a model having model parameters stored on one or more computer readable storage mediums operably coupled to the hardware processor. The training example includes an outcome and features to explain the outcome. A gradient is calculated with respect to the model parameters of the model using the training example. Two estimates of a moment of the gradient with two different time constants are computed for the same type of the moment using the gradient. Using a hardware processor, the model parameters of the model are updated using the two estimates of the moment with the two different time constants to reduce errors while calculating the at least two estimates of the moment of the gradient.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for training a model, comprising: obtaining a training example for a model having model parameters, the training example being randomly selected and including an outcome and features to explain the outcome; calculating a gradient with respect to the model parameters of the model using the training example; computing estimates of a moment of the gradient with different time constants for a same type of the moment using the gradient; and updating, using a hardware processor, the model parameters of the model using the estimates of the moment with the different time constants. 2. The method of claim 1 , wherein each of the model parameters is updated with an amount determined individually by respective components of the estimates of the moment in a conservative manner. 3. The method of claim 2 , wherein a first model parameter of the model is updated by zero or a small amount in response to the estimates of the moment being inconsistent in a component corresponding to the first model parameter. 4. The method of claim 3 , wherein, in response to the estimates of the moment being consistent in the component corresponding to the first model parameter, the first model parameter is updated according to a value generated by combining respective components of the estimates of the moment corresponding to the first model parameter. 5. The method of claim 2 , wherein a first model parameter of the model is updated according to a maximum or a mean of components of the estimates of the moment corresponding to the first model parameter. 6. The method of claim 1 , wherein the moment includes a first order moment and a second order moment as different types, wherein the first order moment represents average of the gradient and the second order moment scales individual learning rates for the model parameters of the model. 7. The method of claim 1 , wherein the moment includes a first order moment and a second order moment as different types and a first model parameter of the model is updated in a manner depending on inconsistency between estimates of the first order moment in a component corresponding to the first model parameter and magnitude relationship between estimates of the second order moment in the component. 8. The method of claim 1 , wherein the time constants change exponential decay rates for moment estimation and the time constants include a first time constant and a second time constant that is larger or smaller than the first time constant. 9. The method of claim 1 , wherein the training example is provided in a streaming manner, wherein the model to be trained is updated each time a new training example arrives and the model is used to predict a value of the outcome based on input features. 10. The method of claim 9 , wherein the input features include a plurality of elements representing past value fluctuations of the outcome observed over a predetermined period. 11. The method of claim 1 , wherein the gradient is a stochastic gradient of an objective function at an iteration step, wherein the objective function evaluates a loss between the outcome in the training example and a prediction done by the model with current values of the model parameters from the features in the training example and the training example includes a single training example or a group of training examples. 12. A computer system for training a model by executing program instructions, the computer system comprising: one or more computer readable storage mediums for storing the program instructions and a training example for a model having model parameters; and processing circuitry in communication with the computer readable storage mediums for executing the program instructions, wherein the processing circuitry is configured to: obtain a training example for a model having model parameters, the training example being randomly selected and including an outcome and features to explain the outcome; calculate a gradient with respect to the model parameters of the model using the training example; compute estimates of a moment of the gradient with different time constants for a same type of the moment using the gradient; and update the model parameters of the model using the estimates of the moment with the different time constants. 13. The computer system of claim 12 , wherein the processing circuitry is configured to update each of the model parameters with an amount determined individually by respective components of the estimates of the moment in a conservative manner. 14. The computer system of claim 13 , wherein the processing circuitry is configured to update a first model parameter of the model by zero or a small amount in response to the estimates of the moment being inconsistent in a component corresponding to the first model parameter and in response to the estimates of the moment being consistent in the component corresponding to the first model parameter, wherein the first model parameter is updated according to a value generated by combining respective components of the estimates of the moment corresponding to the first model parameter. 15. The computer system of claim 13 , wherein the moment includes a first order moment of the gradient and a second order moment of the gradient as different types, wherein the first order moment represents average of the gradient and the second order moment scales individual learning rates for the model parameters of the model. 16. The computer system of claim 14 , wherein the moment includes a first order moment and a second order moment as different types and a first model parameter of the model is updated in a manner depending on inconsistency between estimates of the first order moment in a component corresponding to the first model parameter and a magnitude relationship between estimates of the second order moment in the component. 17. A computer program product for training a model, comprising: a computer readable storage medium having program instructions and training examples for models having model parameters embodied therewith, the program instructions executable by a computer to cause the computer to perform a computer-implemented method comprising: obtaining a training example for a model having model parameters, the training example being randomly selected and including an outcome and features to explain the outcome; calculating a gradient with respect to the model parameters of the model using the training example; computing estimates of a moment of the gradient with different time constants for a same type of the moment using the gradient; and updating, using a hardware processor, the model parameters of the model using the estimates of the moment with the different time constants. 18. The computer program product of claim 17 , wherein the computer is configured to update each of the model parameters with an amount determined individually by respective components of the estimates of the moment in a conservative manner. 19. The computer program product of claim 18 , wherein the computer is configured to update a first model parameter of the model according to a maximum or a mean of components of the estimates of the moment corresponding to the first model parameter. 20. A computer-implemented method for training a model, comprising: obtaining a training example for a model having model parameters, the training example being randomly selected and including an outcome and features to explain the outcome; iteratively calculating a

Assignees

Inventors

Morimura Tetsuro

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N7/01
Probabilistic graphical models, e.g. probabilistic networks · CPC title
G06N3/08Primary
Learning methods · CPC title

Patent family

Related publications grouped by family.

View patent family 77177596

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12020129B2 cover?: A technique for training a model includes obtaining a training example for a model having model parameters stored on one or more computer readable storage mediums operably coupled to the hardware processor. The training example includes an outcome and features to explain the outcome. A gradient is calculated with respect to the model parameters of the model using the training example. Two estim…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 25 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Technologies for optimized machine learning training

Training machine learning models on multiple machine learning tasks

Training distilled machine learning models

Training machine learning models

Frequently asked questions