Freeze-out as a regularizer in training neural networks
US-2021232909-A1 · Jul 29, 2021 · US
US2021406683A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2021406683-A1 |
| Application number | US-202117226279-A |
| Country | US |
| Kind code | A1 |
| Filing date | Apr 9, 2021 |
| Priority date | Jun 25, 2020 |
| Publication date | Dec 30, 2021 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A process includes starting a learning process for building a model including multiple layers each including a parameter. The learning process executes iterations, each including calculating output error of the model using training data and updating the parameter value based on the output error. The process also includes selecting two or more candidate layers representing candidates for layers, where the updating is to be suppressed, based on results of a first iteration of the learning process. The process also includes calculating, based on the number of iterations executed up to the first iteration, a ratio value which becomes larger when the number of iterations executed is greater, and determining, amongst the candidate layers, one or more layers, where the updating is to be suppressed at a second iteration following the first iteration. The number of one or more layers is determined according to the ratio value.
Opening claim text (preview).
What is claimed is: 1 . A non-transitory computer-readable recording medium storing therein a computer program that causes a computer to execute a process comprising: starting a learning process for building a model including a plurality of layers which each include a parameter, the learning process executing iterations, each of which includes calculating output error of the model using training data and updating a value of the parameter of each of the plurality of layers based on the output error; selecting, amongst the plurality of layers, two or more candidate layers representing candidates for layers, in each of which the updating of the value of the parameter is to be suppressed, based on execution results of a first iteration of the learning process; and calculating, based on a number of the iterations executed up to the first iteration, a ratio value which increases with an increase in the number of the iterations executed, and determining, amongst the two or more candidate layers, one or more layers, in each of which the updating of the value of the parameter is to be suppressed at a second iteration following the first iteration, a number of the one or more layers being determined according to the ratio value. 2 . The non-transitory computer-readable recording medium according to claim 1 , wherein: the number of the one or more layers determined according to the ratio value is calculated by multiplying a number of the two or more candidate layers by the ratio value. 3 . The non-transitory computer-readable recording medium according to claim 1 , wherein: the ratio value corresponding to the number of the iterations executed is calculated based on a sigmoid curve. 4 . The non-transitory computer-readable recording medium according to claim 1 , wherein: the updating of the value of the parameter is performed at the second iteration in each remaining layer other than the one or more layers whose number is determined according to the ratio value amongst the two or more candidate layers. 5 . The non-transitory computer-readable recording medium according to claim 1 , wherein: each of the iterations of the learning process includes calculating an error gradient indicating a gradient of the output error with respect to the parameter and updating the value of the parameter based on the error gradient, and the selecting of the two or more candidate layers includes monitoring each of the plurality of layers for an inter-iteration change in the error gradient and selecting each of the two or more candidate layers whose inter-iteration change is below a threshold. 6 . The non-transitory computer-readable recording medium according to claim 1 , wherein: the model is a multi-layer neural network. 7 . The non-transitory computer-readable recording medium according to claim 1 , wherein: each of the iterations of the learning process includes calculating an error gradient indicating a gradient of the output error with respect to the parameter and updating the value of the parameter based on the error gradient, the process further includes calculating, for each of the plurality of layers, an average of the error gradients across the iterations executed up to the first iteration, and each of the one or more layers whose number is determined according to the ratio value is determined based on the average of the error gradients. 8 . The non-transitory computer-readable recording medium according to claim 1 , wherein: each of the iterations of the learning process includes calculating an error gradient indicating a gradient of the output error with respect to the parameter and updating the value of the parameter based on the error gradient, the process further includes monitoring each of the plurality of layers for an inter-iteration change in the error gradient and calculating, for each of the plurality of layers, an average of the inter-iteration changes across the iterations executed up to the first iteration, and each of the one or more layers whose number is determined according to the ratio value is determined based on the average of the inter-iteration changes. 9 . The non-transitory computer-readable recording medium according to claim 1 , wherein: the plurality of layers is classified into a plurality of blocks, each including two or more layers, and each of the one or more layers whose number is determined according to the ratio value is determined based on identity of a block to which the layer belongs. 10 . The non-transitory computer-readable recording medium according to claim 1 , wherein: each of the one or more layers whose number is determined according to the ratio value is determined based on spacing of the one or more layers. 11 . The non-transitory computer-readable recording medium according to claim 1 , wherein: each of the one or more layers whose number is determined according to the ratio value is determined based on proximity thereof to an input of the model. 12 . A learning method comprising: starting, by a processor, a learning process for building a model including a plurality of layers which each include a parameter, the learning process executing iterations, each of which includes calculating output error of the model using training data and updating a value of the parameter of each of the plurality of layers based on the output error; selecting, by the processor, amongst the plurality of layers, two or more candidate layers representing candidates for layers, in each of which the updating of the value of the parameter is to be suppressed, based on execution results of a first iteration of the learning process; and calculating, by the processor, based on a number of the iterations executed up to the first iteration, a ratio value which increases with an increase in the number of the iterations executed, and determining, amongst the two or more candidate layers, one or more layers, in each of which the updating of the value of the parameter is to be suppressed at a second iteration following the first iteration, a number of the one or more layers being determined according to the ratio value. 13 . An information processing apparatus comprising: a memory configured to store training data and a model including a plurality of layers which each include a parameter; and a processor configured to execute a process including: starting a learning process executing iterations, each of which includes calculating output error of the model using the training data and updating a value of the parameter of each of the plurality of layers based on the output error, selecting, amongst the plurality of layers, two or more candidate layers representing candidates for layers, in each of which the updating of the value of the parameter is to be suppressed, based on execution results of a first iteration of the learning process, and calculating, based on a number of the iterations executed up to the first iteration, a ratio value which increases with an increase in the number of the iterations executed, and determining, amongst the two or more candidate layers, one or more layers, in each of which the updating of the value of the parameter is to be suppressed at a second iteration following the first iteration, a number of the one or more layers being determined according to the ratio value.
Related publications grouped by family.
Answers are generated from the same data shown on this page.