Asychronous training of machine learning model

US12190232B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12190232-B2
Application numberUS-201716327679-A
CountryUS
Kind codeB2
Filing dateAug 17, 2017
Priority dateAug 25, 2016
Publication dateJan 7, 2025
Grant dateJan 7, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Various implementations relate to asynchronous training of a machine learning model. A server receives feedback data generated by training the machine learning model from a worker. The feedback data are obtained by the worker with its own training data and are associated with previous values of a set of parameters of the machine learning model at the worker. The server determines differences between the previous values and current values of the set of parameters at the server. The current value may have been updated for once or more due to operation of other workers. Then, the server can update the current values of the set of parameters based on the feedback data and the differences between values of the set of parameters. Thus, the updating does not only take the training result of each worker into consideration but also makes proper compensation for delay between different workers.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method comprising: receiving, by a computing device from a worker implemented by a computer processing unit, feedback data generated by training a machine learning model, the feedback data being associated with previous values of a set of parameters of the machine learning model at the worker; determining, by the computing device, differences between the previous values and current values of the set of parameters; calculating a zero-order term and a first-order term of a series expansion based on the feedback data and the differences; and updating the current values based on the zero-order term and the first-order term to obtain updated values of the set of the parameters, wherein updating the current values based on the zero-order term and the first-order term: comprises applying update amounts to the current values, the update amounts including a term that is a product of a delayed gradient and a learning rate; and provides compensation for delay between a plurality of workers implemented by one or more computer processing units that each provide respective feedback data generated by training the machine learning model, the compensation for delay reducing mismatch between the plurality of workers and enabling efficient asynchronous training of the machine learning model. 2. The method of claim 1 , wherein the feedback data indicates trends of change of an optimization objective of the machine learning model with respect to the previous values of the set of parameters. 3. The method of claim 2 , wherein updating the current values comprises: determining coefficients of a transformation based on the trends of change; and determining differential amounts between the current values and the updated values by applying the transformation on the differences. 4. The method of claim 3 , wherein the transformation is a linear transformation, the coefficients are linear rates of change, and the trends of change are represented by a gradient of the optimization objective with respect to the previous values of the set of parameters. 5. The method of claim 4 , wherein determining the coefficients of the transformation comprises: computing a tensor product of the gradient as unbiased estimates of the linear rates of change. 6. The method of claim 4 , wherein determining the coefficients of the transformation comprises: determining, based on the gradient, magnitudes of rates of change of the optimization objective with respect to respective parameters in the set of parameters; and determining the linear rates of change based on the magnitudes of the rates of change. 7. The method of claim 6 , wherein determining the linear rates of change based on the magnitudes of the rates of change comprises: computing squares of the magnitudes of the rates of change; and determining the linear rates of change based on the squares of the magnitudes of the rates of change. 8. The method of claim 1 , further comprising: receiving a request for the set of parameters from the worker; and in response to the request, transmitting the updated values of the set of parameters to the worker. 9. The method of claim 2 , wherein the machine learning model includes a neural network model and the optimization objective is represented by a cross entropy loss function. 10. An electronic device comprising: a first processing unit; a memory coupled to the first processing unit and storing instructions that, when executed by the first processing unit, cause the electronic device to perform acts comprising: receiving, from a worker implemented by a second processing unit, feedback data generated by training a machine learning model, the feedback data being associated with previous values of a set of parameters of the machine learning model at the worker; determining differences between the previous values and current values of the set of parameters; calculating a zero-order term and a first-order term of a series expansion based on the feedback data and the differences; and updating the current values based on the zero-order term and the first-order term to obtain updated values of the set of the parameters, wherein updating the current values provides compensation for delay between a plurality of workers providing respective feedback data generated by training the machine learning model, the compensation for delay reducing mismatch between the plurality of workers and enabling efficient asynchronous training of the machine learning model, and wherein update amounts applied to the current values as part of the updating include a term that is a product of a delayed gradient and a learning rate. 11. The device of claim 10 , wherein the feedback data indicate trends of change of an optimization objective of the machine learning model with respect to the previous values of the set of parameters. 12. The device of claim 11 , wherein updating the current values comprises: determining coefficients of a transformation based on the trends of change; and determining differential amounts between the current values and the updated values by applying the transformation on the differences. 13. The device of claim 12 , wherein the transformation is a linear transformation, the coefficients are linear rates of change, and the trends of change are represented by the delayed gradient of the optimization objective with respect to the previous values of the set of parameters. 14. The device of claim 13 , wherein determining the coefficients of the transformation comprises: computing a tensor product of the delayed gradient as unbiased estimates of the linear rates of change. 15. A system comprising: a processor; and a memory coupled to the processor and storing instructions that, when executed by the processor, cause a computing device to: receive, from a worker implemented by a processing unit, feedback data generated by training a machine learning model, the machine learning model being a neural network comprising multiple layers, the feedback data being associated with previous values of a set of parameters of the machine learning model at the worker; determine differences between the previous values and current values of the set of parameters; calculate a zero-order term and a first-order term of a series expansion based on the feedback data and the differences; update the current values based on the zero-order term and the first-order term; and compensate for delay between a plurality of workers providing respective feedback data generated during training of the machine learning model by updating the current values based on the zero-order term and the first-order term, wherein compensating for the delay reduces mismatch between the plurality of workers and enables efficient asynchronous training of the machine learning model. 16. The system of claim 15 , wherein the machine learning model is trained using training data that is randomly sampled from a complete set of training data. 17. The system of claim 15 , wherein series expansion corresponds to Taylor expansion and other order terms of the series expansion are not used to update the current values. 18. The system of claim 15 , wherein the first-order term reflects a rate of change of a gradient of an optimization objective. 19. The system of claim 15 , wherein the first-order term corresponds to a second-order derivative of a cross entropy loss function. 20. The system of claim 15 , wherein the instructions, when executed, further

Assignees

Inventors

Classifications

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

  • Distributed learning, e.g. federated learning · CPC title

  • Supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12190232B2 cover?
Various implementations relate to asynchronous training of a machine learning model. A server receives feedback data generated by training the machine learning model from a worker. The feedback data are obtained by the worker with its own training data and are associated with previous values of a set of parameters of the machine learning model at the worker. The server determines differences be…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 07 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).