Federated learning for training machine learning models

US12488223B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12488223-B2
Application numberUS-202117557096-A
CountryUS
Kind codeB2
Filing dateDec 21, 2021
Priority dateDec 21, 2021
Publication dateDec 2, 2025
Grant dateDec 2, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An approach to federated learning of a machine learning model may be provided. The approach may include broadcasting hyperparameters of a machine learning model to one or more client computing devices from a primary device associated with an outer loop or an inner loop. A gradient for the loss function may be calculated at the client device if previous gradients have been sufficiently large. If gradients exceeds a threshold, the client can send the mini-batch of gradients or the difference of the mini-batch of gradients back to the primary device. A search direction may be calculated based on the full gradient of the loss function for an outer loop or the mini-batch of gradient differences for an inner loop. A learning rate step may be calculated from the search direction. The hyperparameter may be updated for the inner loop based on the learning rate.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for training machine learning models through federated learning, the method comprising: initializing one or more machine learning model parameters for an outer loop update, wherein the one or more machine learning model parameters includes a weight associated with nodes within a deep learning neural network; broadcasting, by a primary server, the one or more machine learning model parameters to a plurality of clients; determining a first gradient for the one or more machine learning model parameters from a loss function at the plurality of clients; responsive to the first gradient being above a gradient threshold, returning the first gradient to the primary server; determining a condition for invoking an inner loop update based on a magnitude of an initial search direction for the inner loop being larger than a previous search direction for the outer loop; determining an updated search direction based on a mini-batch of gradient differences for an inner loop update; averaging the search directions for all clients for the outer loop update; responsive to a squared norm of the updated search direction being greater or equal than a squared norm of the initial search direction multiplied by a predetermined factor between 0 and 1 and a count of inner loop iterations is less than or equal a predetermined maximum inner loop size: determining a learning rate of the inner loop update for the machine learning model, based, at least in part, on the updated search direction by: determining a ratio of the updated search direction over the initial search direction; and multiplying the ratio against a reciprocal of a Lipschitz gradient constant of the first gradient; and updating the one or more machine learning model parameters of the inner loop based, at least in part, on the updated search direction and the determined learning rate. 2 . The computer-implemented method of claim 1 , further comprising: broadcasting the updated one or more machine learning model parameters to a client; determining a second gradient of the loss function for the updated one or more machine learning model parameters; identifying determining a previous the first gradient for the previous one or more machine learning model parameters; responsive to the second gradient and the first gradient being above the gradient threshold, determining a difference between the second gradient and the first gradient in an inner loop; and determining a next search direction based on a mini-batch of the determined differences. 3 . The computer-implemented method of claim 2 , further comprising: updating, by the processor, the updated one or more machine learning model parameters of an outer loop based at least in part on the next search direction. 4 . The computer-implemented method of claim 1 , further comprising: responsive to the first gradient being below the threshold: returning a first zero vector to the primary server; and stopping an update of the machine learning model parameter of the inner loop. 5 . The computer-implemented method of claim 2 , further comprising: responsive to the second gradient being below the gradient threshold, returning a second zero vector to the primary server. 6 . A computer system for training machine learning models through federated learning, the system comprising: a memory; and a processor in communication with the memory, the processor being configured to perform operations comprising: initializing one or more machine learning model parameters for an outer loop update, wherein the one or more machine learning model parameters includes a weight associated with nodes within a deep learning neural network; broadcasting, by a primary server, the one or more machine learning model parameters to a plurality of clients; determining a first gradient for the one or more machine learning model parameters from a loss function at the plurality of clients; responsive to the first gradient being above a gradient threshold, returning the first gradient to the primary server; determining a condition for invoking an inner loop update based on a magnitude of an initial search direction for the inner loop being larger than a previous search direction for the outer loop; determining an updated search direction based on a mini-batch of gradient differences for an inner loop update; averaging the search directions for all clients for the outer loop update; responsive to a squared norm of the updated search direction being greater or equal than a squared norm of the initial search direction multiplied by a predetermined factor between 0 and 1 and a count of inner loop iterations is less than or equal a predetermined maximum inner loop size; determining a learning rate of the inner loop update for the machine learning model, based, at least in part, on the updated search direction by: determining a ratio of the updated search direction over the initial search direction; and multiplying the ratio against a reciprocal of a Lipschitz gradient constant of the first gradient; and updating the one or more machine learning model parameters of the inner loop based, at least in part, on the updated search direction and the determined learning rate. 7 . The computer system of claim 6 , the processor being configured to perform operations comprising: broadcasting the updated one or more machine learning model parameters to a client; determining a second gradient of the loss function for the updated one or more machine learning model parameters; identifying the first gradient for the one or more machine learning model parameters; responsive to the second gradient and the first gradient being above the gradient threshold, determining a difference between the second gradient and the first gradient in an inner loop; and determining a next search direction based on a mini-batch of the determined differences. 8 . The computer system of claim 7 , the processor being configured to perform operations comprising: updating the updated one or more machine learning model parameters of an outer loop based at least in part on the next search direction. 9 . The computer system of claim 6 , the processor being configured to perform operations comprising: responsive to the first gradient being below the threshold; returning a first zero vector to the primary server; and stopping an update of the machine learning model parameter of the inner loop. 10 . The computer system of claim 7 , the processor being configured to perform operations comprising: responsive to the second gradient being below the gradient threshold, returning a second zero vector to the primary server. 11 . A computer program product for training machine learning models through federated learning, the computer program product comprising a computer readable storage medium and program instructions stored thereon, the program instructions executable by a processor to cause the processors to perform a function, the function comprising: initializing one or more machine learning model parameters for an outer loop update, wherein the one or more machine learning model parameters includes a weight associated with nodes within a deep learning neural network; broadcasting, by a primary server, the one or more machine learning model parameters to a plurality of clients; determining a first gradient for the one or more machine learning model parameters from a loss function at the plurality of clients; responsive to the first gradient being above a gradient threshold, returning the first gradient to the primary server; determining a condition for invok

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Distributed learning, e.g. federated learning · CPC title

  • G06N3/047Primary

    Probabilistic or stochastic networks · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12488223B2 cover?
An approach to federated learning of a machine learning model may be provided. The approach may include broadcasting hyperparameters of a machine learning model to one or more client computing devices from a primary device associated with an outer loop or an inner loop. A gradient for the loss function may be calculated at the client device if previous gradients have been sufficiently large. If…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N3/047. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 02 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).