Tuning hyper-parameters of a computer-executable learning algorithm

US9330362B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9330362-B2
Application numberUS-201313894429-A
CountryUS
Kind codeB2
Filing dateMay 15, 2013
Priority dateMay 15, 2013
Publication dateMay 3, 2016
Grant dateMay 3, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Technologies pertaining to tuning a hyper-parameter configuration of a learning algorithm are described. The learning algorithm learns parameters of a predictive model based upon the hyper-parameter configuration. Candidate hyper-parameter configurations are identified, and statistical hypothesis tests are undertaken over respective pairs of candidate hyper-parameter configurations to identify, for each pair of candidate hyper-parameter configurations, which of the two configurations is associated with better predictive performance. The technologies described herein take into consideration the stochastic nature of training data, validation data, and evaluation functions.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: receiving an indication that a hyper-parameter of a computer-executable learning algorithm is to be tuned, the hyper-parameter being a parameter of the computer-executable learning algorithm, the computer-executable learning algorithm configured to learn other parameters of a predictive model based upon the hyper-parameter; and assigning a value to the hyper-parameter based upon a pair-wise statistical hypothesis test undertaken over observations pertaining to a pair of candidate hyper-parameter values, the observations generated based upon outputs of a pair of predictive models that respectively correspond to the pair of candidate hyper-parameter values, the outputs based upon a common set of labeled validation data. 2. The method of claim 1 , wherein the indication that the hyper-parameter is to be tuned is an indication that a hyper-parameter configuration for the computer-executable learning algorithm is to be tuned, the hyper-parameter configuration comprising a plurality of hyper-parameters; and assigning respective values to the plurality of hyper-parameters based upon pair-wise statistical hypothesis tests undertaken over observations pertaining to respective predictive models having parameters that are based upon respective hyper-parameter configurations. 3. The method of claim 1 , further comprising: identifying candidate hyper-parameter values from a finite space of discrete values, wherein the value assigned to the hyper-parameter is selected from the candidate hyper-parameter values. 4. The method of claim 1 , identifying candidate hyper-parameter values from a continuous space, wherein the value assigned to the hyper-parameter is selected from the candidate hyper-parameter values. 5. The method of claim 1 , wherein assigning the value to the hyper-parameter is based upon executing a plurality of trials, wherein after each trial in the plurality of trials a respective statistical hypothesis test is undertaken over observations pertaining to pairs of candidate hyper-parameter configurations. 6. The method of claim 1 , wherein assigning the value to the hyper-parameter comprises: evaluating the observations based upon a predefined value of statistical significance; and selecting the value for the hyper-parameter based upon the evaluating of the observations. 7. The method of claim 1 , further comprising using direct-search derivative-free optimization in connection with assigning the value to the hyper-parameter. 8. The method of claim 1 , further comprising using power analysis in connection with performing a threshold number of evaluations over the candidate pair of hyper-parameter configurations. 9. The method of claim 1 , wherein assigning the value to the hyper-parameter comprises performing a paired t-test. 10. A system, comprising: at least one processor; and memory that stores instructions that, when executed by the at least one processor, causes the at least one processor to perform acts comprising: receiving an indication that a hyper-parameter configuration of a learning algorithm is to be ascertained, the learning algorithm configured to learn parameters of a predictive model; and employing statistical hypothesis testing in connection with identifying a value for a hyper-parameter included in the hyper-parameter configuration, wherein the value for the hyper-parameter is useable by the learning algorithm to learn the parameters of the predictive model, and the statistical hypothesis testing is undertaken over observations pertaining to outputs of the predictive model with respect to a set of validation data. 11. The system of claim 10 , wherein the hyper-parameter is one of a learning rate or a regularization coefficient. 12. The system of claim 10 , the acts further comprising generating the observations based upon outputs of the predictive model over the set of validation data. 13. The system of claim 12 , the observations being indicative of performance of the predictive model over the set of validation data. 14. The system of claim 13 , the acts further comprising: comparing observations pertaining to a pair of candidate hyper-parameter configurations; and performing the statistical hypothesis testing based upon the observations. 15. The system of claim 14 , wherein comparing the observations is based upon a level of statistical significance to be considered when performing the statistical hypothesis testing. 16. The system of claim 14 , the acts further comprising causing multiple trials to be undertaken with respect to a pair of candidate hyper-parameter configurations. 17. The system of claim 11 , the acts further comprising executing a search over a continuous numerical space to identify candidate hyper-parameter configurations. 18. The system of claim 11 , the acts further comprising performing cross-validation in connection with identifying the value for the hyper-parameter. 19. The system of claim 11 , the acts further comprising validating performance of the predictive model by way of bootstrapping. 20. A computer-readable storage medium comprising instructions that, when executed by a processor, causes the processor to perform acts comprising: receiving an indication that a hyper-parameter configuration of a computer-executable learning algorithm is to be tuned, the computer-executable learning algorithm configured to learn parameters of a predictive model based upon the hyper-parameter configuration and a set of training data; responsive to receiving the indication, receiving a pair of candidate hyper-parameter configurations of the computer-executable learning algorithm; receiving a desired statistical significance level to use when evaluating performance of a pair predictive models that respectively correspond to the pair of candidate hyper-parameter configurations; receiving an indication of a number of initial evaluations to perform with respect to the pair of candidate hyper-parameter configurations; executing a pair-wise statistical hypothesis test over a pair of observations output by an evaluation function that respectively correspond to the pair of candidate hyper-parameter configurations, the statistical hypothesis test executed based upon the statistical significance level and the number of initial evaluations to perform; and assigning the hyper-parameter configuration to the computer-executable learning algorithm based upon the executing of the pair-wise statistical hypothesis test.

Assignees

Inventors

Classifications

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • based on the proximity to a decision surface, e.g. support vector machines · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Physics · mapped topic

  • Extracting rules from data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9330362B2 cover?
Technologies pertaining to tuning a hyper-parameter configuration of a learning algorithm are described. The learning algorithm learns parameters of a predictive model based upon the hyper-parameter configuration. Candidate hyper-parameter configurations are identified, and statistical hypothesis tests are undertaken over respective pairs of candidate hyper-parameter configurations to identify,…
Who is the assignee on this patent?
Microsoft Corp, Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06N99/005. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 03 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).