Systems and methods of distributed optimization

US10402469B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10402469-B2
Application numberUS-201615045707-A
CountryUS
Kind codeB2
Filing dateFeb 17, 2016
Priority dateOct 16, 2015
Publication dateSep 3, 2019
Grant dateSep 3, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods of determining a global model are provided. In particular, one or more local updates can be received from a plurality of user devices. Each local update can be determined by the respective user device based at least in part on one or more data examples stored on the user device. The one or more data examples stored on the plurality of user devices are distributed on an uneven basis, such that no user device includes a representative sample of the overall distribution of data examples. The local updates can then be aggregated to determine a global model.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of updating a global model based on unevenly distributed data, the method comprising: providing, by one or more computing devices, a current global model to a plurality of user devices; receiving, by the one or more computing devices, a plurality of local updates to the current global model from the plurality of user devices, each local update being determined by the respective user device through performance of a respective number of iterations of a gradient descent training technique on the current global model with respect to one or more data examples stored on the respective user device, wherein, for each of the respective number of training iterations, the respective user device employs a respective device-specific stepsize that controls an amount of change to one or more parameters of the current global model at each iteration, wherein, for each respective user device, the respective device-specific stepsize is inversely proportional to a number of data examples stored on the respective user device, and wherein the one or more data examples stored on the plurality of user devices are distributed on an uneven basis, such that no user device includes a representative sample of an overall distribution of data examples; aggregating, by the one or more computing devices, the received local updates to determine an updated global model; and transmitting, by the one or more computing devices, data descriptive of the updated global model to at least one of the plurality of user devices to be used by the at least one of the plurality of user devices to generate predictions. 2. The computer-implemented method of claim 1 , wherein at least one of the local updates comprise a gradient vector obtained through performance of the respective number of iterations of a gradient descent training technique on the current global model with respect to the data stored on the respective user device. 3. The computer-implemented method of claim 1 , wherein the size of each local update is independent from the size of the data used to determine the local update. 4. The computer-implemented method of claim 1 , wherein, for each respective user device, the respective number of iterations of the gradient descent training technique are determined at least in part by randomly sampling the data examples stored on the respective user device. 5. The computer-implemented method of claim 1 , wherein the one or more local updates are determined by applying, by each user device, a respective device-specific diagonal scaling matrix, wherein the device-specific diagonal scaling matrix for each user device describes, on a coordinate-by-coordinate basis, a ratio of a global appearance frequency of the coordinate to a local appearance frequency of the coordinate on the user device. 6. The computer-implemented method of claim 1 , wherein aggregating, by the one or more computing devices, the received local updates to determine a global model comprises applying, by the one or more computing devices, a respective weighting term to each local update, the respective weighting term for each local update proportional to the number of data examples stored on the user device from which such local update was received. 7. The computer-implemented method of claim 1 , wherein aggregating, by the one or more computing devices, the received local updates to determine a global model comprises scaling the received local updates on a per-coordinate basis using, by the one or more computing devices, a diagonal matrix that describes, on a coordinate-by-coordinate basis, a ratio of the number of user devices to a number of user devices that contain at least one datapoint that is non-zero for such coordinate. 8. The computer-implemented method of claim 1 , wherein aggregating, by the one or more computing devices, the received local updates to determine a global model comprises aggregating the received local updates for at least one iteration. 9. The computer-implemented method of claim 8 , wherein the at least one iteration is determined based at least in part on a threshold. 10. The computer-implemented method of claim 9 , wherein the threshold is determined based at least in part on an amount of time required for communication of the one or more local updates. 11. The computer-implemented method of claim 1 , wherein the number of data examples stored on each user device is smaller than the total number of user devices. 12. The computer-implemented method of claim 1 , further comprising providing a gradient of a loss function to each of the one or more user devices. 13. The computer-implemented method of claim 1 , wherein aggregating, by the one or more computing devices, the received local updates to determine a global model comprises determining a weighted average of the received local updates. 14. A computer-implemented method of updating a local machine learning model based on unevenly distributed data, the method comprising: determining, by a user device, a local model update based at least in part on a gradient vector of a loss function and one or more locally stored data examples, wherein the distribution of the one or more locally stored data examples is not representative of an overall distribution of the data examples used to train a global machine learning model, wherein determining, by the user device, the local model update comprises applying, by the user device, a device-specific diagonal scaling matrix that describes, on a coordinate-by-coordinate basis, a ratio of a global appearance frequency of the coordinate to a local appearance frequency of the coordinate; providing, by the user device, the local model update to a central computing device for use in determination of an update to the global machine learning model, the update to the global machine learning model being determined based on aggregation of the local model update with one or more additional local model updates received from one or more additional user devices; after determination of the update to the global machine learning model, receiving, by the user device, the global machine learning model from the central computing device; and employing, by the user device, the global machine learning model to produce predictions. 15. The computer-implemented method of claim 14 , wherein determining, by the one or more computing devices, a local model update comprises determining the local model update based at least in part on one or more stochastic iterations, each stochastic iteration having a device-specific stepsize that is inversely proportional to a number of the locally stored data examples stored on the user device. 16. The computer-implemented method of claim 15 , wherein the one or more stochastic iterations are determined at least in part by randomly sampling the locally stored data examples stored on the user device. 17. A computing system, comprising: one or more processors; and one or more memory devices, the one or more memory devices storing computer-readable instructions that when executed by the one or more processors cause the one or more processors to perform operations, the operations comprising: determining a local model update associated with an objective function based at least in part on one or more local data examples stored by the computing system, the local model update being determined by performing a number of iterations of a gradient descent training technique on a local version of a model with respect to the one or more local data examples stored by the computing system, each of the number

Assignees

Inventors

Classifications

  • CAD in a network environment, e.g. collaborative CAD or distributed simulation · CPC title

  • Computer-aided design [CAD] · CPC title

  • for solving equations {, e.g. nonlinear equations, general mathematical optimization problems (optimization specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • G06F17/17Primary

    Function evaluation by approximation methods, e.g. inter- or extrapolation, smoothing, least mean square method ({G06F17/18 takes precedence } ; interpolation for numerical control G05B19/18) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10402469B2 cover?
Systems and methods of determining a global model are provided. In particular, one or more local updates can be received from a plurality of user devices. Each local update can be determined by the respective user device based at least in part on one or more data examples stored on the user device. The one or more data examples stored on the plurality of user devices are distributed on an uneve…
Who is the assignee on this patent?
Google Inc, Google Llc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 03 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).