What technology area does this patent fall under?

Primary CPC classification G06N3/045. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Aug 10 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Efficient determination of optimized learning settings of neural networks

US2017228639A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2017228639-A1
Application number	US-201615017248-A
Country	US
Kind code	A1
Filing date	Feb 5, 2016
Priority date	Feb 5, 2016
Publication date	Aug 10, 2017
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Optimized learning settings of neural networks are efficiently determined by an apparatus including a processor and one or more computer readable mediums collectively including instructions that, when executed by the processor, cause the processor to train a first neural network with a learning setting; extract tentative weight data from the first neural network with the learning setting; calculate an evaluation value of the first neural network with the learning setting; and generate a predictive model for predicting an evaluation value of a second neural network with a new setting based on tentative weight data of the second neural network by using a relationship between the tentative weight data of the first neural network and the evaluation value of the first neural network.

First claim

Opening claim text (preview).

What is claimed is: 1 . An apparatus comprising: a processor; and one or more computer readable mediums collectively including instructions that, when executed by the processor, cause the processor to: train a first neural network with a learning setting; extract tentative weight data from the first neural network with the learning setting; calculate an evaluation value of the first neural network with the learning setting; and generate a predictive model for predicting an evaluation value of a second neural network with a new setting based on tentative weight data of the second neural network by using a relationship between the tentative weight data of the first neural network and the evaluation value of the first neural network. 2 . The apparatus of claim 1 , wherein the calculating the evaluation value calculates the evaluation value of the first neural network with the learning setting and weight data further trained from the tentative weight data. 3 . The apparatus of claim 2 , wherein the instructions further cause the processor to: train the second neural network with the new setting; and estimate the evaluation value of the second neural network with the new setting by using the predictive model before completion of training the second neural network with the new setting. 4 . The apparatus of claim 3 , wherein the instructions further cause the processor to: terminate the training of the second neural network with the new setting in response to the evaluation value of the second neural network with the new setting not satisfying a criterion. 5 . The apparatus of claim 4 , wherein the instructions further cause the processor to: generate a plurality of new settings; train a plurality of neural networks, each neural network including a respective new setting among the plurality of new settings; terminate the training of at least one neural network among the plurality of neural networks that does not satisfy the criterion; and select one setting based on performances of neural networks of which training is not terminated. 6 . The apparatus of claim 5 , wherein the instructions further cause the processor to: update the predictive model based on the neural networks of which training is not terminated. 7 . The apparatus of claim 2 , wherein the training of the first neural network with the learning setting includes a plurality of iterations, and wherein the tentative weight data of the first neural network with the learning setting is updated in each of the plurality of iterations. 8 . The apparatus of claim 7 , wherein the generation of the predictive model includes generating a function to estimate the evaluation value from the tentative weight data at two or more iterations of the plurality of iterations. 9 . The apparatus of claim 8 , wherein the two or more iterations are not consecutive. 10 . The apparatus of claim 8 , wherein the function is operable to estimate the evaluation value from differences between the tentative weight data of a first iteration of the plurality of iterations and the tentative weight data of a second iteration of the plurality of iterations. 11 . The apparatus of claim 10 , wherein generating the predictive model further normalizes the tentative weight data of a first iteration of the plurality of iterations and the tentative weight data of a second iteration of the plurality of iterations. 12 . The apparatus of claim 1 , wherein generating the predictive model further normalizes the difference between the tentative weight data of a first iteration of the plurality of iterations and the tentative weight data of a second iteration of the plurality of iterations. 13 . The apparatus of claim 12 , wherein the tentative weight data is extracted only from the last convolutional layer. 14 . The apparatus of claim 1 , wherein the first and second neural networks are convolutional neural networks, and at least part of the tentative weight data is extracted from a last convolutional layer. 15 . A computer-implemented method comprising: training a first neural network with a learning setting; extracting tentative weight data from the first neural network with the learning setting; calculating an evaluation value of the first neural network with the learning setting; and generating a predictive model for predicting an evaluation value of a second neural network with a new setting based on tentative weight data of the second neural network by using a relationship between the tentative weight data of the first neural network and the evaluation value of the first neural network. 16 . The computer-implemented method of claim 15 , wherein the calculating the evaluation value including calculating the evaluation value of the first neural network with the learning setting and weight data further trained from the tentative weight data. 17 . The computer-implemented method of claim 16 , further comprising: training the second neural network with the new setting; and estimating the evaluation value of the second neural network with the new setting by using the predictive model before completion of training the second neural network with the new setting. 18 . The computer-implemented method of claim 17 , further comprising: terminating the training of the second neural network with the new setting in response to the evaluation value of the second neural network with the new setting not satisfying a criterion. 19 . The computer-implemented method of claim 18 , further comprising: generating a plurality of new settings; training a plurality of neural networks, each neural network including a respective new setting among the plurality of new settings; terminating the training of at least one neural network among the plurality of neural networks that does not satisfy the criterion; and selecting one setting based on performances of neural networks of which training is not terminated. 20 . The computer-implemented method of claim 19 , further comprising: updating the predictive model based on the neural networks of which training is not terminated. 21 . A computer program product comprising one or more computer readable mediums collectively including instructions that, when executed by the processor, cause the processor to: train a first neural network with a learning setting; extract tentative weight data from the first neural network with the learning setting; calculate an evaluation value of the first neural network with the learning setting; and generate a predictive model for predicting an evaluation value of a second neural network with a new setting based on tentative weight data of the second neural network by using a relationship between the tentative weight data of the first neural network and the evaluation value of the first neural network. 22 . The computer program product of claim 21 , wherein the calculating the evaluation value calculates the evaluation value of the first neural network with the learning setting and weight data further trained from the tentative weight data. 23 . The computer program product of claim 22 , wherein the instructions further cause the processor to: train the second neural network with the new setting; and estimate the evaluation value of the second neural network with the new setting by using the predictive model before completion of training the second neural network with the new setting.

Assignees

Inventors

Classifications

G06N3/045Primary
Combinations of networks · CPC title
G06N7/01
Probabilistic graphical models, e.g. probabilistic networks · CPC title
G06N3/08Primary
Learning methods · CPC title
G06N3/0985
Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title
G06N3/09
Supervised learning · CPC title

Patent family

Related publications grouped by family.

View patent family 59497841

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017228639A1 cover?: Optimized learning settings of neural networks are efficiently determined by an apparatus including a processor and one or more computer readable mediums collectively including instructions that, when executed by the processor, cause the processor to train a first neural network with a learning setting; extract tentative weight data from the first neural network with the learning setting; calcu…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06N3/045. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Aug 10 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).