Method of outputting prediction result using neural network, method of generating neural network, and apparatus therefor
US-2020134427-A1 · Apr 30, 2020 · US
US2022188636A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2022188636-A1 |
| Application number | US-202117551065-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 14, 2021 |
| Priority date | Dec 14, 2020 |
| Publication date | Jun 16, 2022 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network using meta pseudo-labels. One of the methods includes training a student neural network using pseudo-labels generated by a teacher neural network that is being trained jointly with the student neural network.
Opening claim text (preview).
What is claimed is: 1 . A method performed by one or more computers and for training a student neural network having a plurality of student parameters to perform a machine learning task, the method comprising: training the student neural network jointly with a teacher neural network, wherein the teacher neural network has a plurality of teacher parameters, the joint training comprising repeatedly performing the following: obtaining a first plurality of unlabeled training inputs; processing each of the unlabeled training inputs in the first plurality of unlabeled training inputs using the teacher neural network and in accordance with current values of the teacher parameters to generate a respective teacher output for the machine learning task for each of the unlabeled training inputs; generating, for each of the unlabeled training inputs, a respective pseudo-label for the unlabeled training input from the respective teacher output for the unlabeled training input; training the student neural network to determine updated values of the student parameters from current values of the student parameters by optimizing a student objective function that measures, for each of the unlabeled training inputs in the first plurality of unlabeled training inputs, an error between (i) a respective student output for the unlabeled training input generated by processing the unlabeled training input using the student neural network in accordance with the current values of the student parameters and (ii) the respective pseudo-label for the unlabeled training input; obtaining a first plurality of labeled training inputs and, for each labeled training input in the first plurality of labeled training inputs, a respective ground truth output for the machine learning task; and training the teacher neural network to determine updated values of the teacher parameters by optimizing a teacher objective function that includes a first term that measures, for each of the labeled training inputs in the first plurality of labeled training inputs, an error between (i) a respective student output for the labeled training input generated by processing the labeled training input using the student neural network in accordance with the updated values of the student parameters and (ii) the respective ground truth output for the labeled training input. 2 . The method of claim 1 , wherein the total number of teacher parameters of the teacher neural network is greater than the total number of student parameters of the student neural network. 3 . The method of claim 1 , wherein the teacher objective function also includes a supervised learning term that measures, for each labeled training input in a second plurality of labeled training inputs, an error between (i) a respective teacher output for the labeled training input generated by processing the labeled training input using the teacher neural network in accordance with the current values of the teacher parameters and (ii) a respective ground truth output for the labeled training input. 4 . The method of claim 3 , wherein the first plurality of labeled training inputs are the same as the second plurality of labeled training inputs. 5 . The method of claim 1 , wherein the teacher objective function also includes a semi-supervised learning term that measures, for a second plurality of unlabeled training inputs, a performance of the teacher neural network in accordance with the current values of the teacher parameters on a semi-supervised learning task as measured on the second plurality of unlabeled training inputs. 6 . The method of claim 5 , wherein the first plurality of unlabeled training inputs are the same as the second plurality of unlabeled training inputs. 7 . The method of claim 1 , wherein, for each of the unlabeled training inputs, the respective pseudo-label for the unlabeled training input is the same as the respective teacher output for the unlabeled training input. 8 . The method of claim 7 , wherein training the teacher neural network to determine updated values of the teacher parameters comprises computing, through backpropagation, a gradient of the teacher objective function with respect to the teacher parameters. 9 . The method of claim 1 , wherein each teacher and student output for the machine learning task specifies a respective probability distribution over a plurality of classes and wherein generating, for each of the unlabeled training inputs, the respective pseudo-label for the unlabeled training input comprises: selecting one of the classes using the probability distribution specified by the teacher output; and generating a pseudo-label that identifies the sampled class as the ground-truth output for the unlabeled training input. 10 . The method of claim 9 , wherein selecting one of the classes using the probability distribution specified by the teacher output comprises: sampling one of the classes from the probability distribution specified by the teacher output 11 . The method of claim 9 , wherein training the teacher neural network to determine updated values of the teacher parameters comprises computing an approximate gradient of the first term of the teacher objective function with respect to the teacher parameters. 12 . The method of claim 11 , wherein computing an approximate gradient of the first term of the teacher objective function comprises: computing a first student gradient with respect to the student parameters of the student objective function evaluated at the current values of the student parameters and for the first plurality of unlabeled training inputs; computing a second student gradient with respect to the student parameters of the first term of the teacher objective function evaluated at the updated values of the student parameters and for the first plurality of labeled training inputs; computing a teacher gradient with respect to the teacher parameters of a second objective that measures, for each of the first plurality of unlabeled training inputs, an error between (i) the respective pseudo-label for the unlabeled training input and (ii) the respective teacher output for the unlabeled training input generated by the teacher neural network in accordance with the current values of the teacher parameters; and computing the approximation from the first student gradient, the second student gradient, and the teacher gradient. 13 . The method of claim 12 , wherein computing the approximation from the first student gradient, the second student gradient, and the teacher gradient comprises: determining a feedback coefficient from the first and second student gradients; and multiplying the teacher gradient by the feedback coefficient. 14 . The method of claim 1 , further comprising: after the joint training, further training the student neural network on a third plurality of labeled training inputs through supervised learning. 15 . The method of claim 1 , further comprising: before the joint training, training the teacher neural network on a fourth plurality of labeled training inputs through supervised learning. 16 . One or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations for training a student neural network having a plurality of student parameters to perform a machine learning task, the operations comprising: training the student neural network jointly with a teacher neural network, wherein the teacher neural network has a plurality of teacher parameters, the joint training comprising
Combinations of networks · CPC title
Validation; Performance evaluation; Active pattern learning techniques · CPC title
characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling · CPC title
Backpropagation, e.g. using gradient descent · CPC title
Non-supervised learning, e.g. competitive learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.