Efficient determination of optimized learning settings of neural networks

US2017228639A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2017228639-A1
Application numberUS-201615017248-A
CountryUS
Kind codeA1
Filing dateFeb 5, 2016
Priority dateFeb 5, 2016
Publication dateAug 10, 2017
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Optimized learning settings of neural networks are efficiently determined by an apparatus including a processor and one or more computer readable mediums collectively including instructions that, when executed by the processor, cause the processor to train a first neural network with a learning setting; extract tentative weight data from the first neural network with the learning setting; calculate an evaluation value of the first neural network with the learning setting; and generate a predictive model for predicting an evaluation value of a second neural network with a new setting based on tentative weight data of the second neural network by using a relationship between the tentative weight data of the first neural network and the evaluation value of the first neural network.

First claim

Opening claim text (preview).

What is claimed is: 1 . An apparatus comprising: a processor; and one or more computer readable mediums collectively including instructions that, when executed by the processor, cause the processor to: train a first neural network with a learning setting; extract tentative weight data from the first neural network with the learning setting; calculate an evaluation value of the first neural network with the learning setting; and generate a predictive model for predicting an evaluation value of a second neural network with a new setting based on tentative weight data of the second neural network by using a relationship between the tentative weight data of the first neural network and the evaluation value of the first neural network. 2 . The apparatus of claim 1 , wherein the calculating the evaluation value calculates the evaluation value of the first neural network with the learning setting and weight data further trained from the tentative weight data. 3 . The apparatus of claim 2 , wherein the instructions further cause the processor to: train the second neural network with the new setting; and estimate the evaluation value of the second neural network with the new setting by using the predictive model before completion of training the second neural network with the new setting. 4 . The apparatus of claim 3 , wherein the instructions further cause the processor to: terminate the training of the second neural network with the new setting in response to the evaluation value of the second neural network with the new setting not satisfying a criterion. 5 . The apparatus of claim 4 , wherein the instructions further cause the processor to: generate a plurality of new settings; train a plurality of neural networks, each neural network including a respective new setting among the plurality of new settings; terminate the training of at least one neural network among the plurality of neural networks that does not satisfy the criterion; and select one setting based on performances of neural networks of which training is not terminated. 6 . The apparatus of claim 5 , wherein the instructions further cause the processor to: update the predictive model based on the neural networks of which training is not terminated. 7 . The apparatus of claim 2 , wherein the training of the first neural network with the learning setting includes a plurality of iterations, and wherein the tentative weight data of the first neural network with the learning setting is updated in each of the plurality of iterations. 8 . The apparatus of claim 7 , wherein the generation of the predictive model includes generating a function to estimate the evaluation value from the tentative weight data at two or more iterations of the plurality of iterations. 9 . The apparatus of claim 8 , wherein the two or more iterations are not consecutive. 10 . The apparatus of claim 8 , wherein the function is operable to estimate the evaluation value from differences between the tentative weight data of a first iteration of the plurality of iterations and the tentative weight data of a second iteration of the plurality of iterations. 11 . The apparatus of claim 10 , wherein generating the predictive model further normalizes the tentative weight data of a first iteration of the plurality of iterations and the tentative weight data of a second iteration of the plurality of iterations. 12 . The apparatus of claim 1 , wherein generating the predictive model further normalizes the difference between the tentative weight data of a first iteration of the plurality of iterations and the tentative weight data of a second iteration of the plurality of iterations. 13 . The apparatus of claim 12 , wherein the tentative weight data is extracted only from the last convolutional layer. 14 . The apparatus of claim 1 , wherein the first and second neural networks are convolutional neural networks, and at least part of the tentative weight data is extracted from a last convolutional layer. 15 . A computer-implemented method comprising: training a first neural network with a learning setting; extracting tentative weight data from the first neural network with the learning setting; calculating an evaluation value of the first neural network with the learning setting; and generating a predictive model for predicting an evaluation value of a second neural network with a new setting based on tentative weight data of the second neural network by using a relationship between the tentative weight data of the first neural network and the evaluation value of the first neural network. 16 . The computer-implemented method of claim 15 , wherein the calculating the evaluation value including calculating the evaluation value of the first neural network with the learning setting and weight data further trained from the tentative weight data. 17 . The computer-implemented method of claim 16 , further comprising: training the second neural network with the new setting; and estimating the evaluation value of the second neural network with the new setting by using the predictive model before completion of training the second neural network with the new setting. 18 . The computer-implemented method of claim 17 , further comprising: terminating the training of the second neural network with the new setting in response to the evaluation value of the second neural network with the new setting not satisfying a criterion. 19 . The computer-implemented method of claim 18 , further comprising: generating a plurality of new settings; training a plurality of neural networks, each neural network including a respective new setting among the plurality of new settings; terminating the training of at least one neural network among the plurality of neural networks that does not satisfy the criterion; and selecting one setting based on performances of neural networks of which training is not terminated. 20 . The computer-implemented method of claim 19 , further comprising: updating the predictive model based on the neural networks of which training is not terminated. 21 . A computer program product comprising one or more computer readable mediums collectively including instructions that, when executed by the processor, cause the processor to: train a first neural network with a learning setting; extract tentative weight data from the first neural network with the learning setting; calculate an evaluation value of the first neural network with the learning setting; and generate a predictive model for predicting an evaluation value of a second neural network with a new setting based on tentative weight data of the second neural network by using a relationship between the tentative weight data of the first neural network and the evaluation value of the first neural network. 22 . The computer program product of claim 21 , wherein the calculating the evaluation value calculates the evaluation value of the first neural network with the learning setting and weight data further trained from the tentative weight data. 23 . The computer program product of claim 22 , wherein the instructions further cause the processor to: train the second neural network with the new setting; and estimate the evaluation value of the second neural network with the new setting by using the predictive model before completion of training the second neural network with the new setting.

Assignees

Inventors

Classifications

  • G06N3/045Primary

    Combinations of networks · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title

  • Supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017228639A1 cover?
Optimized learning settings of neural networks are efficiently determined by an apparatus including a processor and one or more computer readable mediums collectively including instructions that, when executed by the processor, cause the processor to train a first neural network with a learning setting; extract tentative weight data from the first neural network with the learning setting; calcu…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N3/045. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Aug 10 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).