Communication efficient federated learning
US-11763197-B2 · Sep 19, 2023 · US
US12340308B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12340308-B2 |
| Application number | US-202217695325-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 15, 2022 |
| Priority date | Aug 6, 2020 |
| Publication date | Jun 24, 2025 |
| Grant date | Jun 24, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and systems for training a neural network include collecting model exemplar information from edge devices, each model exemplar having been trained using information local to the respective edge devices. The collected model exemplar information is aggregated together using federated averaging. Global model exemplars are trained using federated constrained clustering. The trained global exemplars are transmitted to respective edge devices.
Opening claim text (preview).
What is claimed is: 1. A method for training a neural network, comprising: training an edge model exemplar at an edge device, using an initialized global model exemplar, based on information collected at the edge device, including optimizing an objective function: min θ , C - 1 n ∑ i = 1 n K L ( p i q i ) - α T log ( 1 n ∑ i = 1 n q i ) + 1 / n ∑ i = 1 n M ( X i ) where n is a number of multivariate time series segments X i , θ is a set of parameters for the neural network to be learned, C is a set of edge model exemplars, KL(·) is a Kullback-Leibler divergence, p i is a target cluster membership vector for an i th locally gathered information, q i is a cluster membership vector for the i th locally gathered information, α is a prior distribution over the edge model exemplars, and M(X i ) is a term that preserves local similarity of an original feature space; transmitting the edge model exemplar to a server without transmitting the information collected at the edge device; receiving an updated global model exemplar that is based on the edge model exemplar and at least one other model exemplar from another edge device; and retraining the edge model exemplar using the updated global model exemplar. 2. The method of claim 1 , wherein the updated global model exemplar is a federated average of the edge model exemplar and the at least one other model exemplar. 3. The method of claim 2 , wherein the federated average is an element-wise average of exemplars. 4. The method of claim 1 , further comprising repeating the transmitting, receiving, and retraining based on additional information collected at the edge device. 5. The method of claim 1 , wherein the edge model exemplar includes the neural network, which includes a bidirectional long-short term memory layer. 6. The method of claim 1 , further comprising determining an anomaly score using the retrained edge model exemplar based on the information gathered at the edge device. 7. The method of claim 6 , wherein determining the anomaly score is based on a similarity between new information and existing exemplars. 8. The method of claim 1 , wherein the retrained edge model exemplar recognizes operating conditions from cyber-physical systems associated with a plurality of edge devices. 9. A system for training a neural network, comprising: a hardware processor; and a memory that stores a computer program, which, when executed by the hardware processor, causes the hardware processor to: train an edge model exemplar at an edge device, using an initialized global model exemplar, based on information collected at the edge device, including optimization of an objective function: min θ , C - 1 n ∑ i = 1 n K L ( p i q i ) -
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
Distributed learning, e.g. federated learning · CPC title
Architecture, e.g. interconnection topology · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.