Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06N3/084. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 07 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Asychronous training of machine learning model

US12190232B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12190232-B2
Application number	US-201716327679-A
Country	US
Kind code	B2
Filing date	Aug 17, 2017
Priority date	Aug 25, 2016
Publication date	Jan 7, 2025
Grant date	Jan 7, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Various implementations relate to asynchronous training of a machine learning model. A server receives feedback data generated by training the machine learning model from a worker. The feedback data are obtained by the worker with its own training data and are associated with previous values of a set of parameters of the machine learning model at the worker. The server determines differences between the previous values and current values of the set of parameters at the server. The current value may have been updated for once or more due to operation of other workers. Then, the server can update the current values of the set of parameters based on the feedback data and the differences between values of the set of parameters. Thus, the updating does not only take the training result of each worker into consideration but also makes proper compensation for delay between different workers.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method comprising: receiving, by a computing device from a worker implemented by a computer processing unit, feedback data generated by training a machine learning model, the feedback data being associated with previous values of a set of parameters of the machine learning model at the worker; determining, by the computing device, differences between the previous values and current values of the set of parameters; calculating a zero-order term and a first-order term of a series expansion based on the feedback data and the differences; and updating the current values based on the zero-order term and the first-order term to obtain updated values of the set of the parameters, wherein updating the current values based on the zero-order term and the first-order term: comprises applying update amounts to the current values, the update amounts including a term that is a product of a delayed gradient and a learning rate; and provides compensation for delay between a plurality of workers implemented by one or more computer processing units that each provide respective feedback data generated by training the machine learning model, the compensation for delay reducing mismatch between the plurality of workers and enabling efficient asynchronous training of the machine learning model. 2. The method of claim 1 , wherein the feedback data indicates trends of change of an optimization objective of the machine learning model with respect to the previous values of the set of parameters. 3. The method of claim 2 , wherein updating the current values comprises: determining coefficients of a transformation based on the trends of change; and determining differential amounts between the current values and the updated values by applying the transformation on the differences. 4. The method of claim 3 , wherein the transformation is a linear transformation, the coefficients are linear rates of change, and the trends of change are represented by a gradient of the optimization objective with respect to the previous values of the set of parameters. 5. The method of claim 4 , wherein determining the coefficients of the transformation comprises: computing a tensor product of the gradient as unbiased estimates of the linear rates of change. 6. The method of claim 4 , wherein determining the coefficients of the transformation comprises: determining, based on the gradient, magnitudes of rates of change of the optimization objective with respect to respective parameters in the set of parameters; and determining the linear rates of change based on the magnitudes of the rates of change. 7. The method of claim 6 , wherein determining the linear rates of change based on the magnitudes of the rates of change comprises: computing squares of the magnitudes of the rates of change; and determining the linear rates of change based on the squares of the magnitudes of the rates of change. 8. The method of claim 1 , further comprising: receiving a request for the set of parameters from the worker; and in response to the request, transmitting the updated values of the set of parameters to the worker. 9. The method of claim 2 , wherein the machine learning model includes a neural network model and the optimization objective is represented by a cross entropy loss function. 10. An electronic device comprising: a first processing unit; a memory coupled to the first processing unit and storing instructions that, when executed by the first processing unit, cause the electronic device to perform acts comprising: receiving, from a worker implemented by a second processing unit, feedback data generated by training a machine learning model, the feedback data being associated with previous values of a set of parameters of the machine learning model at the worker; determining differences between the previous values and current values of the set of parameters; calculating a zero-order term and a first-order term of a series expansion based on the feedback data and the differences; and updating the current values based on the zero-order term and the first-order term to obtain updated values of the set of the parameters, wherein updating the current values provides compensation for delay between a plurality of workers providing respective feedback data generated by training the machine learning model, the compensation for delay reducing mismatch between the plurality of workers and enabling efficient asynchronous training of the machine learning model, and wherein update amounts applied to the current values as part of the updating include a term that is a product of a delayed gradient and a learning rate. 11. The device of claim 10 , wherein the feedback data indicate trends of change of an optimization objective of the machine learning model with respect to the previous values of the set of parameters. 12. The device of claim 11 , wherein updating the current values comprises: determining coefficients of a transformation based on the trends of change; and determining differential amounts between the current values and the updated values by applying the transformation on the differences. 13. The device of claim 12 , wherein the transformation is a linear transformation, the coefficients are linear rates of change, and the trends of change are represented by the delayed gradient of the optimization objective with respect to the previous values of the set of parameters. 14. The device of claim 13 , wherein determining the coefficients of the transformation comprises: computing a tensor product of the delayed gradient as unbiased estimates of the linear rates of change. 15. A system comprising: a processor; and a memory coupled to the processor and storing instructions that, when executed by the processor, cause a computing device to: receive, from a worker implemented by a processing unit, feedback data generated by training a machine learning model, the machine learning model being a neural network comprising multiple layers, the feedback data being associated with previous values of a set of parameters of the machine learning model at the worker; determine differences between the previous values and current values of the set of parameters; calculate a zero-order term and a first-order term of a series expansion based on the feedback data and the differences; update the current values based on the zero-order term and the first-order term; and compensate for delay between a plurality of workers providing respective feedback data generated during training of the machine learning model by updating the current values based on the zero-order term and the first-order term, wherein compensating for the delay reduces mismatch between the plurality of workers and enables efficient asynchronous training of the machine learning model. 16. The system of claim 15 , wherein the machine learning model is trained using training data that is randomly sampled from a complete set of training data. 17. The system of claim 15 , wherein series expansion corresponds to Taylor expansion and other order terms of the series expansion are not used to update the current values. 18. The system of claim 15 , wherein the first-order term reflects a rate of change of a gradient of an optimization objective. 19. The system of claim 15 , wherein the first-order term corresponds to a second-order derivative of a cross entropy loss function. 20. The system of claim 15 , wherein the instructions, when executed, further

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06N3/084Primary
Backpropagation, e.g. using gradient descent · CPC title
G06N3/098
Distributed learning, e.g. federated learning · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/045
Combinations of networks · CPC title

Patent family

Related publications grouped by family.

View patent family 59738469

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12190232B2 cover?: Various implementations relate to asynchronous training of a machine learning model. A server receives feedback data generated by training the machine learning model from a worker. The feedback data are obtained by the worker with its own training data and are associated with previous values of a set of parameters of the machine learning model at the worker. The server determines differences be…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 07 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Distributed training of models using stochastic gradient descent

Using specialized workers to improve performance in machine learning

Training a model using parameter server shards

Asynchronous optimization for sequence training of neural networks

Frequently asked questions