Dynamic gradient aggregation for training neural networks

US2022036178A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2022036178-A1
Application numberUS-202016945715-A
CountryUS
Kind codeA1
Filing dateJul 31, 2020
Priority dateJul 31, 2020
Publication dateFeb 3, 2022
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The disclosure herein describes training a global model based on a plurality of data sets. The global model is applied to each data set of the plurality of data sets and a plurality of gradients is generated based on that application. At least one gradient quality metric is determined for each gradient of the plurality of gradients. Based on the determined gradient quality metrics of the plurality of gradients, a plurality of weight factors is calculated. The plurality of gradients is transformed into a plurality of weighted gradients based on the calculated plurality of weight factors and a global gradient is generated based on the plurality of weighted gradients. The global model is updated based on the global gradient, wherein the updated global model, when applied to a data set, performs a task based on the data set and provides model output based on performing the task.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system for training a global model based on a plurality of data sets, the system comprising: at least one processor; and at least one memory comprising computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the at least one processor to: apply the global model to each data set of the plurality of data sets; generate a plurality of gradients based on applying the global model to each data set of the plurality of data sets, wherein an individual gradient of the plurality of gradients is generated based on an individual data set of the plurality of data sets; determine a plurality of gradient quality metrics, at least one gradient quality metric for each individual gradient of the plurality of gradients, the at least one gradient quality metric indicating a degree to which the individual gradient can be used to improve the global model; calculate a plurality of weight factors using the determined plurality of gradient quality metrics, the plurality of weight factors including an individual weight factor for each individual gradient of the plurality of gradients; transform the plurality of gradients into a plurality of weighted gradients based on the calculated plurality of weight factors; generate a global gradient based on the plurality of weighted gradients; and update the global model based on the global gradient, wherein the updated global model, when applied to the individual data set, performs a task based on the individual data set and provides model output based on performing the task. 2 . The system of claim 1 , wherein the plurality of gradient quality metrics include at least one of gradient mean values, gradient variance values, or training loss values associated with applying the global model to each data set of the plurality of data sets. 3 . The system of claim 1 , wherein calculating the plurality of weight factors includes applying a softmax function to the at least one gradient quality metric for each individual gradient of the plurality of gradients, wherein the individual weight factor of the plurality of weight factors for each individual gradient of the plurality of gradients is based on a result of applying the softmax function to the at least one gradient quality metric of the individual gradient. 4 . The system of claim 3 , wherein calculating the plurality of weight factors further includes applying a weight factor neural network model to the at least one gradient quality metric for each individual gradient of the plurality of gradients, wherein the individual weight factor of the plurality of weight factors for each individual gradient of the plurality of gradients is based on a result of applying the weight factor neural network model to the at least one gradient quality metric of the individual gradient. 5 . The system of claim 4 , wherein the plurality of weight factors includes a first set of weight factors based on applying the softmax function to the at least one gradient quality metric for each individual gradient of the plurality of gradients and a second set of weight factors based on applying the weight factor neural network model to the at least one gradient quality metric for each individual gradient of the plurality of gradients; wherein the at least one memory and the computer program code configured to, with the at least one processor, further cause the at least one processor to: test the first set of weight factors and the second set of weight factors based on a test data set; and select a set of weight factors from the first set of weight factors and the second set of weight factors based on the testing; and wherein transforming the plurality of gradients into a plurality of weighted gradients based on the calculated plurality of weight factors includes transforming the plurality of gradients into a plurality of weighted gradients based on the selected set of weight factors. 6 . The system of claim 5 , wherein the at least one memory and the computer program code configured to, with the at least one processor, further cause the at least one processor to train the weight factor neural network model based on the testing of the first set of weight factors and the second set of weight factors based on the test data set. 7 . The system of claim 1 , wherein the at least one memory and the computer program code configured to, with the at least one processor, further cause the at least one processor to, based on updating the global model based on the global gradient, train the updated global model using a held-out data set associated with the task, wherein divergence away from the task caused by updating the global model is reduced. 8 . A computerized method for training a global model based on a plurality of clients, the computerized method comprising: providing, by a processor, the global model to a plurality of clients; receiving, by the processor, a plurality of gradients from the plurality of clients; determining, by the processor, a plurality of gradient quality metrics, at least one gradient quality metric for each individual gradient of the plurality of gradients, the at least one gradient quality metric indicating a degree to which the individual gradient can be used to improve the global model; calculating, by the processor, a plurality of weight factors using the determined plurality of gradient quality metrics, the plurality of weight factors including an individual weight factor for each individual gradient of the plurality of gradients; transforming, by the processor, the plurality of gradients into a plurality of weighted gradients based on the calculated plurality of weight factors; generating, by the processor, a global gradient based on the plurality of weighted gradients; and updating, by the processor, the global model based on the global gradient, wherein the updated global model, when applied to an individual data set, performs a task based on the individual data set and provides model output based on performing the task. 9 . The computerized method of claim 8 , wherein the plurality of gradient quality metrics includes at least one of gradient mean values, gradient variance values, or training loss values associated with applying the global model to each data set of a plurality of data sets. 10 . The computerized method of claim 8 , wherein calculating the plurality of weight factors includes applying, by the processor, a softmax function to the at least one gradient quality metric for each individual gradient of the plurality of gradients, wherein the individual weight factor of the plurality of weight factors for each individual gradient of the plurality of gradients is based on a result of applying the softmax function to the at least one gradient quality metric of the individual gradient. 11 . The computerized method of claim 10 , wherein calculating the plurality of weight factors further includes applying, by the processor, a weight factor neural network model to the at least one gradient quality metric for each individual gradient of the plurality of gradients, wherein the individual weight factor of the plurality of weight factors for each individual gradient of the plurality of gradients is based on a result of applying the weight factor neural network model to the at least one gradient quality metric of the individual gradient. 12 . The computerized method of claim 11 , wherein the plurality of weight factors includes a first set of weight factors based on applying the softmax function to the at least one gradient quality metric for each individual gradient of the plurality of

Assignees

Inventors

Classifications

  • Probabilistic or stochastic networks · CPC title

  • Combinations of networks · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • Reinforcement learning · CPC title

  • Distributed learning, e.g. federated learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022036178A1 cover?
The disclosure herein describes training a global model based on a plurality of data sets. The global model is applied to each data set of the plurality of data sets and a plurality of gradients is generated based on that application. At least one gradient quality metric is determined for each gradient of the plurality of gradients. Based on the determined gradient quality metrics of the plural…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Feb 03 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).