What technology area does this patent fall under?

Primary CPC classification G06N3/0472. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Jun 22 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Federated learning for training machine learning models

US2023196081A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2023196081-A1
Application number	US-202117557096-A
Country	US
Kind code	A1
Filing date	Dec 21, 2021
Priority date	Dec 21, 2021
Publication date	Jun 22, 2023
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An approach to federated learning of a machine learning model may be provided. The approach may include broadcasting hyperparameters of a machine learning model to one or more client computing devices from a primary device associated with an outer loop or an inner loop. A gradient for the loss function may be calculated at the client device if previous gradients have been sufficiently large. If gradients exceeds a threshold, the client can send the mini-batch of gradients or the difference of the mini-batch of gradients back to the primary device. A search direction may be calculated based on the full gradient of the loss function for an outer loop or the mini-batch of gradient differences for an inner loop. A learning rate step may be calculated from the search direction. The hyperparameter may be updated for the inner loop based on the learning rate.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for training machine learning models through federated, the method comprising: initializing, by a processor, one or more machine learning model parameters for an outer loop update; broadcasting, by the processor, the one or more machine learning model parameters to a plurality of clients; determining, by the processor, a gradient for the one or more machine learning model parameters from the loss function at the plurality of clients; responsive to the gradient being above a threshold, returning, by the processor, the gradient to the server; determining, by the processor, the condition for invoking an inner loop based on the magnitude of the search directions in the inner loop and outer loop; determining, by the processor, the search direction based on the full gradient for an outer loop and the mini-batch of gradient differences for an inner loop; averaging, by the processor, the search directions for all clients for an outer loop update; responsive to the squared norm of the updated search direction being greater or equal than the squared norm of an initial search direction multiplied by a predetermined factor between 0 and 1 and the number of inner iterations is less than or equal a predetermined maximum inner loop size; determining, by the processor, a learning rate of the inner loop for the machine learning model, based, at least in part, on the updated search direction; and updating, by the processor, the one or more machine learning model parameters of the inner loop based, at least in part, on the search direction and the learning rate. 2 . The computer-implemented method of claim 1 , further comprising: broadcasting, by the processor, the updated machine learning model parameter to a client; determining, by the processor, a gradient of the loss function for the updated machine learning model parameter; determining, by the processor, a previous gradient for the previous machine learning model parameter, responsive to the current gradient and the previous gradient being above a threshold, determining a difference between the current-gradient and the previous gradient in an inner loop; and determining, by the processor, a next search direction based on the mini-batch of determined differences. 3 . The computer-implemented method of claim 2 , further comprising: updating, by the processor, the updated machine learning model parameter of an outer loop based at least in part on the second determined search direction. 4 . The computer-implemented method of claim 1 , further comprising: responsive to the gradient being below the threshold, returning, by the processor, a zero vector to the server; and stopping, by the processor, the update of the machine learning model parameter of the inner loop. 5 . The computer-implemented method of claim 2 , further comprising: responsive to the current gradient and the previous gradient being below the threshold, returning, by the processor, a zero vector to the server. 6 . The computer-implemented method of claim 1 , wherein updating the learning rate of the machine learning model, further comprises: determining, by the processor, a ratio of the determined search direction-over an initial search direction; and multiplying, by the processor, the ratio against the reciprocal of a Lipschitz gradient constant of one or more gradients. 7 . The computer-implemented method of claim 1 , wherein the machine learning model parameter is a weight associated with nodes within a deep learning neural network. 8 . A computer system for training machine learning models through federated, the system comprising: a memory; and a processor in communication with the memory, the processor being configured to perform operations comprising: initialize one or more machine learning model parameters for an outer loop update; broadcast the one or more machine learning model parameters to a plurality of clients; determine a gradient for the one or more machine learning model parameters from the loss function at the plurality of clients; responsive to the gradient being above a threshold, return the gradient to the server; determine the condition for invoking an inner loop based on the magnitude of the search directions in the inner loop and outer loop; determine the search direction based on the full gradient for an outer loop and the mini-batch of gradient differences for an inner loop; average the search directions for all clients for an outer loop update; responsive to the squared norm of the updated search direction of all the clients being greater or equal than the squared norm of an initial search direction multiplied by a predetermined factor between 0 and 1 and the number of inner iterations is less than or equal a predetermined maximum inner loop size, determine a learning rate of the inner loop for the machine learning model, based, at least in part, on the updated search direction; and update the one or more machine learning model parameters of the inner loop based, at least in part, on the search direction and the learning rate. 9 . The computer system of claim 8 , further comprising operations to: broadcast the updated machine learning model parameter to a client; determine a gradient of the loss function for the updated machine learning model parameter; determine a previous gradient for the previous machine learning model parameter; responsive to the current gradient and the previous gradient being above a threshold, determine a difference between the current gradient and the previous gradient in an inner loop; and determine a next search direction based on the mini-batch of determined differences. 10 . The computer system of claim 9 , further comprising operations to: update the updated machine learning model parameter of an outer loop based at least in part on the second determined search direction. 11 . The computer system of claim 8 , further comprising operations to: responsive to the gradient being below the threshold, return a zero vector to the server; and stop, the update of the machine learning model parameter of the inner loop. 12 . The computer system of claim 9 , further comprising operations to: responsive to the current gradient and the previous gradient being below the threshold, return a zero vector to the server. 13 . The computer system of claim 8 , wherein updating the learning rate of the machine learning model, further comprises: determine a ratio of the determined search direction over an initial search direction; and multiply the ratio against the reciprocal of a Lipschitz gradient constant of one or more gradients. 14 . The computer system of claim 8 : wherein the machine learning model parameter is a weight associated with nodes within a deep learning neural network. 15 . A computer program product for training machine learning models through federated, the computer program product comprising one or more computer readable storage devices and program instructions sorted on the one or more computer readable storage device, the program instructions executable by a processor to cause the processors to perform a function, the function comprising: initialize one or more machine learning model parameters for an outer loop update; broadcast the one or more machine learning model parameters to a plurality of clients; determine a gradient for the one or more machine learning model parameters from the loss function at the plurality of clients; responsive to the gradient being above a threshold, return the gradient to the server; det

Assignees

Inventors

Classifications

G06N3/0454
Physics · mapped topic
G06N3/0472Primary
Physics · mapped topic
G06N3/045
Combinations of networks · CPC title
G06N3/047Primary
Probabilistic or stochastic networks · CPC title
G06N3/084Primary
Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

View patent family 86768435

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2023196081A1 cover?: An approach to federated learning of a machine learning model may be provided. The approach may include broadcasting hyperparameters of a machine learning model to one or more client computing devices from a primary device associated with an outer loop or an inner loop. A gradient for the loss function may be calculated at the client device if previous gradients have been sufficiently large. If…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06N3/0472. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Jun 22 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).