Distributed machine learning using network measurements

US12165022B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12165022-B2
Application numberUS-202017791628-A
CountryUS
Kind codeB2
Filing dateJan 10, 2020
Priority dateJan 10, 2020
Publication dateDec 10, 2024
Grant dateDec 10, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method performed by a central server node in a distributed machine learning environment is provided. The method includes: managing distributed machine learning for a plurality of local client nodes, such that a first set of the plurality of local client nodes are assigned to assist training of a first central model and a second set of the plurality of local client nodes are assigned to assist training of a second central model; obtaining information regarding network conditions for the plurality of local client nodes; clustering the plurality of local client nodes into one or more clusters based at least in part on the information regarding network conditions; re-assigning a local client node in the first set to the second set based on the clustering; and sending to the local client node a message including model weights for the second central model.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method performed by a central server node in a distributed machine learning environment, the method comprising: obtaining first information regarding network conditions for a plurality of local client nodes, wherein the plurality of local client nodes includes a first client node and the first information comprises a first network performance indicator for the first client node; managing distributed machine learning for the plurality of local client nodes, wherein the managing comprises assigning a first set of the plurality of local client nodes to assist training of a first central model and assigning a second set of the plurality of local client nodes to assist training of a second central model; after assigning the first set to assist training the first central model and assigning the second set to assist training the second central model, obtaining second information regarding network conditions for the plurality of local client nodes, wherein the second information comprises a second network performance indicator for the first client node; based on the first and second network performance indicators, determining a change in network performance for the first network node; determining that the change in network performance for the first network node is greater than a threshold; as a result of determining that the change in network performance for the first network node is greater than the threshold, clustering the plurality of local client nodes into one or more clusters based at least in part on the second information regarding network conditions; re-assigning a local client node in the first set to the second set based on the clustering; and sending to the local client node a message including model weights for the second central model. 2. The method of claim 1 , wherein obtaining the first information and/or the second information regarding network conditions for the plurality of local client nodes comprises performing passive monitoring of the network conditions. 3. The method of claim 2 , wherein performing passive monitoring of the network conditions comprises computing one-way and/or round-trip delay times based on messaging between the central server node and the plurality of local client nodes relating to model weight computations. 4. The method of claim 3 , further comprising estimating network conditions based on the obtained information regarding network conditions. 5. The method of claim 4 , wherein estimating network conditions comprises estimating statistics for the network conditions over a time window, wherein the statistics include one or more of a mean, a median, a percentile, a standard deviation, a minimum, and a maximum, and wherein the network conditions include one or more of delay, delay jitter, and packet loss. 6. A method performed by a central server node in a distributed machine learning environment, the method comprising: sending a first message to a local client node assigned to assist training of a central model, the first message indicating to the local client node that the local client node is to participate in a first round of distributed machine learning and to compute updated model weights for the central model; receiving a second message from the local client node comprising the updated model weights for the central model; computing a delay measurement based on one or more of the first message and the second message; identifying a change in a network condition of the local client node based at least in part on the delay measurement; and managing distributed machine learning based at least in part on the identified change in a network condition of the local client node, wherein managing distributed machine learning comprises comparing the change in the network condition to a threshold and determining whether to include the local node in a second round of distributed machine learning based on the comparison. 7. The method of claim 6 , wherein managing distributed machine learning based at least in part on the identified change in a network condition of the local client node comprises: determining to include the local client node in the second round of distributed machine learning; and in response to the determining, sending a third message to the local client node, the third message indicating to the local client node that the local client node is to participate in the second round of distributed machine learning and to compute updated model weights for the central model. 8. The method of claim 6 , wherein managing distributed machine learning based at least in part on the identified change in a network condition of the local client node comprises determining not to include the local client node in the second round of distributed machine learning. 9. The method of claim 6 , wherein managing distributed machine learning based at least in part on the identified change in a network condition of the local client node comprises: in response to the identified change in a network condition of the local client node, clustering the local client node and one or more additional local client nodes based at least in part on the delay measurement; and determining, based at least in part on the clustering, to re-assign the local client node to another central model different from the central model. 10. The method of claim 6 , wherein computing a delay measurement based on one or more of the first message and the second message comprises computing a round-trip delay based on both the first message and the second message. 11. The method of claim 6 , wherein computing a delay measurement based on one or more of the first message and the second message comprises computing a one-way delay based on the second message. 12. The method of claim 6 , further comprising: sending additional messages to the local client node indicating to the local client node that the local client node is to participate in additional rounds of distributed machine learning and to compute updated model weights for the central model; receiving additional messages from the local client node comprising the updated model weights for the central model; computing, for each round of the additional rounds of distributed machine learning, a delay measurement based on the additional messages sent to and received from the local client node; and computing one or more of latency, throughput, and jitter based on the delay measurements, wherein identifying a change in a network condition of the local client node based at least in part on the delay measurement is further based at least in part on the one or more of latency, throughput, and jitter. 13. The method of claim 6 , wherein the first message further comprises initial model weights, and wherein the initial model weights are the same initial model weights that the central server node sends to other local client nodes participating in the first round of distributed machine learning. 14. A central server node comprising: a memory; and a processor, wherein said processor is configured to: obtain first information regarding network conditions for a plurality of local client nodes, wherein the plurality of local client nodes includes a first client node and the first information comprises a first network performance indicator for the first client node; manage distributed machine learning for the plurality of local client nodes, wherein the managing comprises assigning a first set of the plurality of local client nodes to assist training of a first central model and assigning a second set of the plurality of local client nodes to assist training of a se

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12165022B2 cover?
A method performed by a central server node in a distributed machine learning environment is provided. The method includes: managing distributed machine learning for a plurality of local client nodes, such that a first set of the plurality of local client nodes are assigned to assist training of a first central model and a second set of the plurality of local client nodes are assigned to assist…
Who is the assignee on this patent?
Ericsson Telefon Ab L M
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 10 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).