Optimized training of linear machine learning models

US10318882B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10318882-B2
Application numberUS-201414484201-A
CountryUS
Kind codeB2
Filing dateSep 11, 2014
Priority dateSep 11, 2014
Publication dateJun 11, 2019
Grant dateJun 11, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An indication of a data source to be used to train a linear prediction model is obtained. The model is to generate predictions using respective parameters assigned to a plurality of features derived from observation records of the data source. The parameter values are stored in a parameter vector. During a particular learning iteration of the training phase of the model, one or more features for which parameters are to be added to the parameter vector are identified. In response to a triggering condition, parameters for one or more features are removed from the parameter vector based on an analysis of relative contributions of the features represented in the parameter vector to the model's predictions. After the parameters are removed, at least one parameter is added to the parameter vector.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: one or more computing devices configured to: receive, at a machine learning service of a provider network, an indication of a data source to be used for generating a linear prediction model, wherein, to generate a prediction, the linear prediction model is to utilize respective weights assigned to individual ones of a plurality of features derived from observation records of the data source, wherein the respective weights are stored in a parameter vector of the linear prediction model and updated in-memory during a machine training phase of the linear prediction model; determine, based at least in part on examination of a particular set of observation records of the data source, respective weights for one or more features to be added to the parameter vector during a particular learning iteration of a plurality of learning iterations of the training phase of the linear prediction model, wherein the addition increases memory consumption during the machine training phase; check, during one or more of the plurality of learning iterations, for a triggering condition to prune the parameter vector; in response to a determination that the triggering condition has been met during the training phase, identify one or more pruning victims from a set of features whose weights are included in the parameter vector, based at least in part on a quantile analysis of the weights, wherein the quantile analysis is performed without a sort operation; and remove at least a particular weight corresponding to a particular pruning victim of the one or more pruning victims from the parameter vector, wherein the removal reduces memory consumption during the training phase; and generate, during a post-training-phase prediction run of the linear prediction model, a prediction using at least one feature for which a weight is determined after the particular weight of the particular pruning victim is removed from the parameter vector. 2. The system as recited in claim 1 , wherein the triggering condition is based at least in part on a population of the parameter vector. 3. The system as recited in claim 1 , wherein the triggering condition is based at least in part on a goal indicated by a client. 4. The system as recited in claim 1 , wherein the one or more computing devices are further configured to: during a subsequent learning iteration of the plurality of learning iterations, performed after the particular learning iteration, determine that a weight for the particular pruning victim is to be re-added to the parameter vector; and add the weight corresponding to the particular pruning victim to the parameter vector. 5. The system as recited in claim 1 , wherein a first feature of the one or more features whose weights are to be added to the parameter vector during the particular learning iteration is derived from one or more variables of the observation records of the data source via a transformation that comprises a use of one or more of: (a) a quantile bin function, (b) a Cartesian product function, (c) a bi-gram function, (d) an n-gram function, (e) an orthogonal sparse bigram function, (f) a calendar function, (g) an image processing function, (h) an audio processing function, (i) a bio-informatics processing function, (j) a natural language processing function or (k) a video processing function. 6. A method, comprising: performing, by one or more computing devices: receiving an indication of a data source to be used for training a machine learning model, wherein, to generate a prediction, the machine learning model is to utilize respective parameters assigned to individual ones of a plurality of features derived from observation records of the data source, wherein the respective parameters are stored in a parameter vector of the machine learning model and updated in-memory during a training phase of the machine learning model; identifying one or more features for which respective parameters are to be added to the parameter vector during a particular learning iteration of a plurality of learning iterations of the training phase of the machine learning model, wherein the addition increases memory consumption during the training phase; checking, during one or more of the plurality of learning iterations, for a triggering condition to prune the parameter vector; in response to determining that the triggering condition has been met in the training phase, removing respective parameters of one or more pruning victim features from the parameter vector, wherein the removal reduces memory consumption during the training phase, and wherein the one or more pruning victim features are selected based at least in part on an analysis of relative contributions of features whose parameters are included in the parameter vector to predictions made using the machine learning model; and generating, during a post-training-phase prediction run of the machine learning model, a particular prediction using at least one feature for which a parameter is determined after the one or more pruning victim features are selected. 7. The method as recited in claim 6 , wherein the analysis of relative contributions comprises a quantile analysis of weights included in the parameter vector. 8. The method as recited in claim 6 , wherein the analysis of relative contributions (a) does not comprise a sort operation and (b) does not comprise copying values of the parameters included in the parameter vector. 9. The method as recited in claim 6 , wherein said determining that the triggering condition has been met comprises determining that a population of the parameter vector exceeds a threshold. 10. The method as recited in claim 6 , wherein the triggering condition is based at least in part on a resource capacity constraint of a server of a machine learning service. 11. The method as recited in claim 6 , wherein the triggering condition is based at least in part on a goal indicated by a client. 12. The method as recited in claim 6 , further comprising performing, by the one or more computing devices: during a subsequent learning iteration of the plurality of learning iterations, performed after the particular learning iteration, determining that a parameter for a particular feature which was previously selected as a pruning victim feature is to be re-added to the parameter vector; and adding the parameter for the particular feature to the parameter vector. 13. The method as recited in claim 6 , wherein a first feature of the one or more features for which respective parameters are to be added to the parameter vector during the particular learning iteration is determined from one or more variables of observation records of the data source via a transformation that comprises a use of one or more of: (a) a quantile bin function, (b) a Cartesian product function, (c) a bi-gram function, (d) an n-gram function, (e) an orthogonal sparse bigram function, (f) a calendar function, (g) an image processing function, (h) an audio processing function, (i) a bio-informatics processing function, (j) a natural language processing function, or (k) a video processing function. 14. The method as recited in claim 6 , further comprising performing, by the one or more computing devices: implementing a stochastic gradient descent technique to update, during the particular learning iteration, one or more previously-generated parameters included in the parameter vector. 15. The method as recited in claim 6 , wherein the machine learning model comprises a generalized linear model. 16. The method as recited in claim 6 , furth

Assignees

Inventors

Classifications

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • based on the proximity to a decision surface, e.g. support vector machines · CPC title

  • Physics · mapped topic

  • in which an application is distributed across nodes in the network (software deployment G06F8/60; multiprogramming arrangements G06F9/46) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10318882B2 cover?
An indication of a data source to be used to train a linear prediction model is obtained. The model is to generate predictions using respective parameters assigned to a plurality of features derived from observation records of the data source. The parameter values are stored in a parameter vector. During a particular learning iteration of the training phase of the model, one or more features fo…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 11 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).