Method and apparatus for managing recommendation models
US-9218605-B2 · Dec 22, 2015 · US
US9218573B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9218573-B1 |
| Application number | US-201313826327-A |
| Country | US |
| Kind code | B1 |
| Filing date | Mar 14, 2013 |
| Priority date | May 22, 2012 |
| Publication date | Dec 22, 2015 |
| Grant date | Dec 22, 2015 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a model using parameter server shards. One of the methods includes receiving, at a parameter server shard configured to maintain values of a disjoint partition of the parameters of the model, a succession of respective requests for parameter values from each of a plurality of replicas of the model; in response to each request, downloading a current value of each requested parameter to the replica from which the request was received; receiving a succession of uploads, each upload including respective delta values for each of the parameters in the partition maintained by the shard; and updating values of the parameters in the partition maintained by the parameter server shard repeatedly based on the uploads of delta values to generate current parameter values.
Opening claim text (preview).
What is claimed is: 1. A system for training a model having parameters by determining a respective parameter value for each of the parameters of the model, the system comprising: a plurality of identical model replicas, wherein each of the plurality of replicas is an identical instance of the model with possibly different parameter values for the parameters of the model, wherein each model replica executes on a respective computing unit, wherein each model replica is configured to operate independently of each other model replica, and wherein each model replica is further configured to perform repeatedly the following operations: receiving, from at least one of a plurality of parameter server shards, current values of one or more of the parameters of the model, wherein each parameter server shard is configured to maintain values of a respective disjoint partition of the parameters of the model; computing respective delta values for each of a plurality of the parameters of the model by performing one or more iterations of a training process; and providing, for each of the plurality of parameters, the delta value for the parameter to the parameter server shard that is configured to maintain the respective partition that includes the parameter. 2. The system of claim 1 , wherein the training process is a stochastic gradient descent process. 3. The system of claim 2 , wherein: performing one or more iterations of the stochastic gradient descent process comprises obtaining a respective batch of training data; and computing the respective delta values for each of the plurality of parameters comprises computing a gradient of an objective function for the model based on the initial values and the batch of training data. 4. The system of claim 1 , wherein: performing one or more iterations of the training process comprises obtaining a respective batch of training data; and computing the respective delta values for each of the plurality of parameters comprises computing a gradient of an objective function for the model based on the initial values and the batch of training data. 5. The system of claim 3 , wherein each model replica obtains a different sequence of training data. 6. The system of claim 3 , wherein each model replica obtains different training data. 7. The system of claim 3 , wherein receiving current values of one or more of the plurality of parameters comprises: identifying one or more parameters for which current values are necessary to perform the one or more iterations of the training process; identifying one or more parameter server shards that are configured to maintain values of the one or more parameters; and requesting parameter values only from the one or more parameter server shards. 8. The system of claim 1 , further comprising: the plurality of parameter server shards, wherein each shard is configured to perform repeatedly the following operations asynchronously with respect to every other shard: receive a succession of respective requests for parameter values from each of the plurality of replicas of the model; in response to each request, download a current value of each requested parameter to the replica from which the request was received; receive, from each of the plurality of replicas, a succession of uploads, each upload including respective delta values for each of the parameters in the partition maintained by the shard; and update values of the parameters in the partition maintained by the parameter server shard repeatedly based on the uploads of delta values to generate current parameter values. 9. The system of claim 8 , wherein the updated value of a parameter (p u ) satisfies: p u =p c −α×Δp r , wherein p c is a current value of the parameter, α is a learning rate, and Δp r is a received delta value for the parameter. 10. The system of claim 9 , wherein the learning rate is an adaptive learning rate that varies between parameters. 11. The system of claim 9 , wherein the learning rate is an adaptive learning rate that varies between iterations of the training process. 12. A method for training a model having parameters by determining a respective parameter value for each of the parameters of the model, the method comprising: receiving, from at least one of a plurality of parameter server shards and at a model replica of a plurality of model replicas, current values of one or more of the parameters of the model, wherein each parameter server shard is configured to maintain values of a respective disjoint partition of the parameters of the model, and wherein each of the plurality of replicas is an identical instance of the model with possibly different parameter values for the parameters of the model; computing, by the model replica, respective delta values for each of a plurality of the parameters of the model by performing one or more iterations of a training process; and providing, by the model replica and for each of the plurality of parameters, the delta value for the parameter to the parameter server shard that is configured to maintain the respective partition that includes the parameter. 13. The method of claim 12 , wherein the training process is a stochastic gradient descent process. 14. The method of claim 13 , wherein: performing one or more iterations of the stochastic gradient descent process comprises obtaining a respective batch of training data; and computing the respective delta values for each of the plurality of parameters comprises computing a gradient of an objective function for the model based on the initial values and the batch of training data. 15. The method of claim 12 , wherein: performing one or more iterations of the training process comprises obtaining a respective batch of training data; and computing the respective delta values for each of the plurality of parameters comprises computing a gradient of an objective function for the model based on the initial values and the batch of training data. 16. The method of claim 14 , wherein each model replica obtains a different sequence of training data. 17. The method of claim 14 , wherein each model replica obtains different training data. 18. The method of claim 14 , wherein receiving current values of one or more of the plurality of parameters comprises: identifying one or more parameters for which current values are necessary to perform the one or more iterations of the training process; identifying one or more parameter server shards that are configured to maintain values of the one or more parameters; and requesting parameter values only from the one or more parameter server shards. 19. The method of claim 14 , further comprising: receiving, at a parameter server shard of the plurality of parameter server shards, a succession of respective requests for parameter values from each of the plurality of replicas of the model; in response to each request, downloading, by the parameter server shard, a current value of each requested parameter to the replica from which the request was received; receiving, at the parameter server shard and from each of the plurality of replicas, a succession of uploads, each upload including respective delta values for each of the parameters in the partition maintained by the shard; and updating, by the parameter server shard, values of the parameters in the partition maintained by the parameter server shard repeatedly based on the uploads of delta values to generate current parameter values. 20. The method of cla
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Probabilistic graphical models, e.g. probabilistic networks · CPC title
based on the proximity to a decision surface, e.g. support vector machines · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Supervised learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.