Managing and adapting monitoring programs
US-11296971-B1 · Apr 5, 2022 · US
US12165022B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12165022-B2 |
| Application number | US-202017791628-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 10, 2020 |
| Priority date | Jan 10, 2020 |
| Publication date | Dec 10, 2024 |
| Grant date | Dec 10, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method performed by a central server node in a distributed machine learning environment is provided. The method includes: managing distributed machine learning for a plurality of local client nodes, such that a first set of the plurality of local client nodes are assigned to assist training of a first central model and a second set of the plurality of local client nodes are assigned to assist training of a second central model; obtaining information regarding network conditions for the plurality of local client nodes; clustering the plurality of local client nodes into one or more clusters based at least in part on the information regarding network conditions; re-assigning a local client node in the first set to the second set based on the clustering; and sending to the local client node a message including model weights for the second central model.
Opening claim text (preview).
The invention claimed is: 1. A method performed by a central server node in a distributed machine learning environment, the method comprising: obtaining first information regarding network conditions for a plurality of local client nodes, wherein the plurality of local client nodes includes a first client node and the first information comprises a first network performance indicator for the first client node; managing distributed machine learning for the plurality of local client nodes, wherein the managing comprises assigning a first set of the plurality of local client nodes to assist training of a first central model and assigning a second set of the plurality of local client nodes to assist training of a second central model; after assigning the first set to assist training the first central model and assigning the second set to assist training the second central model, obtaining second information regarding network conditions for the plurality of local client nodes, wherein the second information comprises a second network performance indicator for the first client node; based on the first and second network performance indicators, determining a change in network performance for the first network node; determining that the change in network performance for the first network node is greater than a threshold; as a result of determining that the change in network performance for the first network node is greater than the threshold, clustering the plurality of local client nodes into one or more clusters based at least in part on the second information regarding network conditions; re-assigning a local client node in the first set to the second set based on the clustering; and sending to the local client node a message including model weights for the second central model. 2. The method of claim 1 , wherein obtaining the first information and/or the second information regarding network conditions for the plurality of local client nodes comprises performing passive monitoring of the network conditions. 3. The method of claim 2 , wherein performing passive monitoring of the network conditions comprises computing one-way and/or round-trip delay times based on messaging between the central server node and the plurality of local client nodes relating to model weight computations. 4. The method of claim 3 , further comprising estimating network conditions based on the obtained information regarding network conditions. 5. The method of claim 4 , wherein estimating network conditions comprises estimating statistics for the network conditions over a time window, wherein the statistics include one or more of a mean, a median, a percentile, a standard deviation, a minimum, and a maximum, and wherein the network conditions include one or more of delay, delay jitter, and packet loss. 6. A method performed by a central server node in a distributed machine learning environment, the method comprising: sending a first message to a local client node assigned to assist training of a central model, the first message indicating to the local client node that the local client node is to participate in a first round of distributed machine learning and to compute updated model weights for the central model; receiving a second message from the local client node comprising the updated model weights for the central model; computing a delay measurement based on one or more of the first message and the second message; identifying a change in a network condition of the local client node based at least in part on the delay measurement; and managing distributed machine learning based at least in part on the identified change in a network condition of the local client node, wherein managing distributed machine learning comprises comparing the change in the network condition to a threshold and determining whether to include the local node in a second round of distributed machine learning based on the comparison. 7. The method of claim 6 , wherein managing distributed machine learning based at least in part on the identified change in a network condition of the local client node comprises: determining to include the local client node in the second round of distributed machine learning; and in response to the determining, sending a third message to the local client node, the third message indicating to the local client node that the local client node is to participate in the second round of distributed machine learning and to compute updated model weights for the central model. 8. The method of claim 6 , wherein managing distributed machine learning based at least in part on the identified change in a network condition of the local client node comprises determining not to include the local client node in the second round of distributed machine learning. 9. The method of claim 6 , wherein managing distributed machine learning based at least in part on the identified change in a network condition of the local client node comprises: in response to the identified change in a network condition of the local client node, clustering the local client node and one or more additional local client nodes based at least in part on the delay measurement; and determining, based at least in part on the clustering, to re-assign the local client node to another central model different from the central model. 10. The method of claim 6 , wherein computing a delay measurement based on one or more of the first message and the second message comprises computing a round-trip delay based on both the first message and the second message. 11. The method of claim 6 , wherein computing a delay measurement based on one or more of the first message and the second message comprises computing a one-way delay based on the second message. 12. The method of claim 6 , further comprising: sending additional messages to the local client node indicating to the local client node that the local client node is to participate in additional rounds of distributed machine learning and to compute updated model weights for the central model; receiving additional messages from the local client node comprising the updated model weights for the central model; computing, for each round of the additional rounds of distributed machine learning, a delay measurement based on the additional messages sent to and received from the local client node; and computing one or more of latency, throughput, and jitter based on the delay measurements, wherein identifying a change in a network condition of the local client node based at least in part on the delay measurement is further based at least in part on the one or more of latency, throughput, and jitter. 13. The method of claim 6 , wherein the first message further comprises initial model weights, and wherein the initial model weights are the same initial model weights that the central server node sends to other local client nodes participating in the first round of distributed machine learning. 14. A central server node comprising: a memory; and a processor, wherein said processor is configured to: obtain first information regarding network conditions for a plurality of local client nodes, wherein the plurality of local client nodes includes a first client node and the first information comprises a first network performance indicator for the first client node; manage distributed machine learning for the plurality of local client nodes, wherein the managing comprises assigning a first set of the plurality of local client nodes to assist training of a first central model and assigning a second set of the plurality of local client nodes to assist training of a se
Delays · CPC title
using time related information in packets, e.g. by adding timestamps · CPC title
Round trip delays · CPC title
considering the load · CPC title
Grid computing · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.