Gradient pruning for efficient training of machine learning models
US-2022261648-A1 · Aug 18, 2022 · US
US2023196081A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2023196081-A1 |
| Application number | US-202117557096-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 21, 2021 |
| Priority date | Dec 21, 2021 |
| Publication date | Jun 22, 2023 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An approach to federated learning of a machine learning model may be provided. The approach may include broadcasting hyperparameters of a machine learning model to one or more client computing devices from a primary device associated with an outer loop or an inner loop. A gradient for the loss function may be calculated at the client device if previous gradients have been sufficiently large. If gradients exceeds a threshold, the client can send the mini-batch of gradients or the difference of the mini-batch of gradients back to the primary device. A search direction may be calculated based on the full gradient of the loss function for an outer loop or the mini-batch of gradient differences for an inner loop. A learning rate step may be calculated from the search direction. The hyperparameter may be updated for the inner loop based on the learning rate.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method for training machine learning models through federated, the method comprising: initializing, by a processor, one or more machine learning model parameters for an outer loop update; broadcasting, by the processor, the one or more machine learning model parameters to a plurality of clients; determining, by the processor, a gradient for the one or more machine learning model parameters from the loss function at the plurality of clients; responsive to the gradient being above a threshold, returning, by the processor, the gradient to the server; determining, by the processor, the condition for invoking an inner loop based on the magnitude of the search directions in the inner loop and outer loop; determining, by the processor, the search direction based on the full gradient for an outer loop and the mini-batch of gradient differences for an inner loop; averaging, by the processor, the search directions for all clients for an outer loop update; responsive to the squared norm of the updated search direction being greater or equal than the squared norm of an initial search direction multiplied by a predetermined factor between 0 and 1 and the number of inner iterations is less than or equal a predetermined maximum inner loop size; determining, by the processor, a learning rate of the inner loop for the machine learning model, based, at least in part, on the updated search direction; and updating, by the processor, the one or more machine learning model parameters of the inner loop based, at least in part, on the search direction and the learning rate. 2 . The computer-implemented method of claim 1 , further comprising: broadcasting, by the processor, the updated machine learning model parameter to a client; determining, by the processor, a gradient of the loss function for the updated machine learning model parameter; determining, by the processor, a previous gradient for the previous machine learning model parameter, responsive to the current gradient and the previous gradient being above a threshold, determining a difference between the current-gradient and the previous gradient in an inner loop; and determining, by the processor, a next search direction based on the mini-batch of determined differences. 3 . The computer-implemented method of claim 2 , further comprising: updating, by the processor, the updated machine learning model parameter of an outer loop based at least in part on the second determined search direction. 4 . The computer-implemented method of claim 1 , further comprising: responsive to the gradient being below the threshold, returning, by the processor, a zero vector to the server; and stopping, by the processor, the update of the machine learning model parameter of the inner loop. 5 . The computer-implemented method of claim 2 , further comprising: responsive to the current gradient and the previous gradient being below the threshold, returning, by the processor, a zero vector to the server. 6 . The computer-implemented method of claim 1 , wherein updating the learning rate of the machine learning model, further comprises: determining, by the processor, a ratio of the determined search direction-over an initial search direction; and multiplying, by the processor, the ratio against the reciprocal of a Lipschitz gradient constant of one or more gradients. 7 . The computer-implemented method of claim 1 , wherein the machine learning model parameter is a weight associated with nodes within a deep learning neural network. 8 . A computer system for training machine learning models through federated, the system comprising: a memory; and a processor in communication with the memory, the processor being configured to perform operations comprising: initialize one or more machine learning model parameters for an outer loop update; broadcast the one or more machine learning model parameters to a plurality of clients; determine a gradient for the one or more machine learning model parameters from the loss function at the plurality of clients; responsive to the gradient being above a threshold, return the gradient to the server; determine the condition for invoking an inner loop based on the magnitude of the search directions in the inner loop and outer loop; determine the search direction based on the full gradient for an outer loop and the mini-batch of gradient differences for an inner loop; average the search directions for all clients for an outer loop update; responsive to the squared norm of the updated search direction of all the clients being greater or equal than the squared norm of an initial search direction multiplied by a predetermined factor between 0 and 1 and the number of inner iterations is less than or equal a predetermined maximum inner loop size, determine a learning rate of the inner loop for the machine learning model, based, at least in part, on the updated search direction; and update the one or more machine learning model parameters of the inner loop based, at least in part, on the search direction and the learning rate. 9 . The computer system of claim 8 , further comprising operations to: broadcast the updated machine learning model parameter to a client; determine a gradient of the loss function for the updated machine learning model parameter; determine a previous gradient for the previous machine learning model parameter; responsive to the current gradient and the previous gradient being above a threshold, determine a difference between the current gradient and the previous gradient in an inner loop; and determine a next search direction based on the mini-batch of determined differences. 10 . The computer system of claim 9 , further comprising operations to: update the updated machine learning model parameter of an outer loop based at least in part on the second determined search direction. 11 . The computer system of claim 8 , further comprising operations to: responsive to the gradient being below the threshold, return a zero vector to the server; and stop, the update of the machine learning model parameter of the inner loop. 12 . The computer system of claim 9 , further comprising operations to: responsive to the current gradient and the previous gradient being below the threshold, return a zero vector to the server. 13 . The computer system of claim 8 , wherein updating the learning rate of the machine learning model, further comprises: determine a ratio of the determined search direction over an initial search direction; and multiply the ratio against the reciprocal of a Lipschitz gradient constant of one or more gradients. 14 . The computer system of claim 8 : wherein the machine learning model parameter is a weight associated with nodes within a deep learning neural network. 15 . A computer program product for training machine learning models through federated, the computer program product comprising one or more computer readable storage devices and program instructions sorted on the one or more computer readable storage device, the program instructions executable by a processor to cause the processors to perform a function, the function comprising: initialize one or more machine learning model parameters for an outer loop update; broadcast the one or more machine learning model parameters to a plurality of clients; determine a gradient for the one or more machine learning model parameters from the loss function at the plurality of clients; responsive to the gradient being above a threshold, return the gradient to the server; det
Physics · mapped topic
Physics · mapped topic
Combinations of networks · CPC title
Probabilistic or stochastic networks · CPC title
Backpropagation, e.g. using gradient descent · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.