Efficient training of neural networks
US-2018075347-A1 · Mar 15, 2018 · US
US10657461B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10657461-B2 |
| Application number | US-201716335695-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 7, 2017 |
| Priority date | Sep 26, 2016 |
| Publication date | May 19, 2020 |
| Grant date | May 19, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure provides efficient communication techniques for transmission of model updates within a machine learning framework, such as, for example, a federated learning framework in which a high-quality centralized model is trained on training data distributed overt a large number of clients each with unreliable network connections and low computational power. In an example federated learning setting, in each of a plurality of rounds, each client independently updates the model based on its local data and communicates the updated model back to the server, where all the client-side updates are used to update a global model. The present disclosure provides systems and methods that reduce communication costs. In particular, the present disclosure provides at least: structured update approaches in which the model update is restricted to be small and sketched update approaches in which the model update is compressed before sending to the server.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for communication efficient machine learning, the method comprising: obtaining, by a client computing device, global values for a set of parameters of a machine-learned model; training, by the client computing device, the machine-learned model based at least in part on a local dataset to obtain an update matrix that is descriptive of updated values for the set of parameters of the machine-learned model, wherein the update matrix is restricted to be a low-rank matrix, and wherein the local dataset is stored locally by the client computing device; and communicating, by the client computing device, information descriptive of the update matrix to a server computing device for use by the server computing device in computation of a global update to the machine-learned model, wherein: training, by the client computing device, the machine-learned model based at least in part on the local dataset to obtain the update matrix comprises: defining, by the client computing device, the update matrix as a product of a first matrix and a second matrix, wherein the first matrix comprises fixed values and the second matrix comprises optimizable variables, and wherein the fixed values of the first matrix are known to the server computing device; and training, by the client computing device, machine-learned model based at least in part on the local dataset to obtain the second matrix; and communicating, by the client computing device, information descriptive of the update matrix to the server computing device comprises communicating, by the client computing device, information descriptive of the second matrix to the server computing device without sending the first matrix from the client computing device to the server computing device. 2. The computer-implemented method of claim 1 , further comprising, prior to training, by the client computing device, the machine-learned model: generating, by the client computing device, the first matrix based at least in part on a seed and a pseudo-random number generator, wherein both the client computing device and the server computing device have knowledge of the seed such that the first matrix is reproducible by the server computing device. 3. The computer-implemented method of claim 1 , wherein the update matrix is restricted to be a sparse matrix. 4. The computer-implemented method of claim 1 , wherein training, by the client computing device, the machine-learned model based at least in part on the local dataset comprises training, by the client computing device, the machine-learned model based at least in part on the local dataset such that updated values are determined only for a pre-selected portion of the set of parameters, the update matrix descriptive of only the updated values for the pre-selected portion of the set of parameters. 5. The computer-implemented method of claim 4 , further comprising, prior to training, by the client computing device, the machine-learned model: generating, by the client computing device, a parameter mask that specifies which of the set of parameters are included in the pre-selected portion of the set of parameters. 6. The computer-implemented method of claim 5 , wherein generating, by the client computing device, the parameter mask comprises generating, by the client computing device, the parameter mask based at least in part on a seed and a pseudo-random number generator, wherein both the client computing device and the server computing device have knowledge of the seed such that the parameter mask is reproducible by the server computing device. 7. The computer-implemented method of claim 1 , wherein the update matrix describes the updated values for the set of parameters or respective differences between the updated values and the global values. 8. A client computing device, comprising: at least one processor; and at least one non-transitory computer-readable medium that stores instructions that, when executed by the at least one processor, cause the client computing device to perform operations comprising: obtaining global values for a set of parameters of a machine-learned model; training the machine-learned model based at least in part on a local dataset to obtain an update matrix that is descriptive of updated values for the set of parameters of the machine-learned model, wherein the update matrix is restricted to be a low-rank matrix, and wherein the local dataset is stored locally by the client computing device; and communicating information descriptive of the update matrix to a server computing device for use by the server computing device in computation of a global update to the machine-learned model, wherein: training the machine-learned model based at least in part on the local dataset to obtain the update matrix comprises: defining the update matrix as a product of a first matrix and a second matrix, wherein the first matrix comprises fixed values and the second matrix comprises optimizable variables, and wherein the fixed values of the first matrix are known to the server computing device; and training machine-learned model based at least in part on the local dataset to obtain the second matrix; and communicating information descriptive of the update matrix to the server computing device comprises communicating, by the client computing device, information descriptive of the second matrix to the server computing device without sending the first matrix from the client computing device to the server computing device. 9. At least one non-transitory computer-readable medium that stores instructions that, when executed by a client computing device, cause the client computing device to perform operations comprising: obtaining global values for a set of parameters of a machine-learned model; training the machine-learned model based at least in part on a local dataset to obtain an update matrix that is descriptive of updated values for the set of parameters of the machine-learned model, wherein the update matrix is restricted to be a low-rank matrix, and wherein the local dataset is stored locally by the client computing device; and communicating information descriptive of the update matrix to a server computing device for use by the server computing device in computation of a global update to the machine-learned model, wherein: training the machine-learned model based at least in part on the local dataset to obtain the update matrix comprises: defining the update matrix as a product of a first matrix and a second matrix, wherein the first matrix comprises fixed values and the second matrix comprises optimizable variables, and wherein the fixed values of the first matrix are known to the server computing device; and training machine-learned model based at least in part on the local dataset to obtain the second matrix; and communicating information descriptive of the update matrix to the server computing device comprises communicating, by the client computing device, information descriptive of the second matrix to the server computing device without sending the first matrix from the client computing device to the server computing device.
Learning methods · CPC title
Pseudo-random number generators · CPC title
for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.