Meta pseudo-labels

US2022188636A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2022188636-A1
Application numberUS-202117551065-A
CountryUS
Kind codeA1
Filing dateDec 14, 2021
Priority dateDec 14, 2020
Publication dateJun 16, 2022
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network using meta pseudo-labels. One of the methods includes training a student neural network using pseudo-labels generated by a teacher neural network that is being trained jointly with the student neural network.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method performed by one or more computers and for training a student neural network having a plurality of student parameters to perform a machine learning task, the method comprising: training the student neural network jointly with a teacher neural network, wherein the teacher neural network has a plurality of teacher parameters, the joint training comprising repeatedly performing the following: obtaining a first plurality of unlabeled training inputs; processing each of the unlabeled training inputs in the first plurality of unlabeled training inputs using the teacher neural network and in accordance with current values of the teacher parameters to generate a respective teacher output for the machine learning task for each of the unlabeled training inputs; generating, for each of the unlabeled training inputs, a respective pseudo-label for the unlabeled training input from the respective teacher output for the unlabeled training input; training the student neural network to determine updated values of the student parameters from current values of the student parameters by optimizing a student objective function that measures, for each of the unlabeled training inputs in the first plurality of unlabeled training inputs, an error between (i) a respective student output for the unlabeled training input generated by processing the unlabeled training input using the student neural network in accordance with the current values of the student parameters and (ii) the respective pseudo-label for the unlabeled training input; obtaining a first plurality of labeled training inputs and, for each labeled training input in the first plurality of labeled training inputs, a respective ground truth output for the machine learning task; and training the teacher neural network to determine updated values of the teacher parameters by optimizing a teacher objective function that includes a first term that measures, for each of the labeled training inputs in the first plurality of labeled training inputs, an error between (i) a respective student output for the labeled training input generated by processing the labeled training input using the student neural network in accordance with the updated values of the student parameters and (ii) the respective ground truth output for the labeled training input. 2 . The method of claim 1 , wherein the total number of teacher parameters of the teacher neural network is greater than the total number of student parameters of the student neural network. 3 . The method of claim 1 , wherein the teacher objective function also includes a supervised learning term that measures, for each labeled training input in a second plurality of labeled training inputs, an error between (i) a respective teacher output for the labeled training input generated by processing the labeled training input using the teacher neural network in accordance with the current values of the teacher parameters and (ii) a respective ground truth output for the labeled training input. 4 . The method of claim 3 , wherein the first plurality of labeled training inputs are the same as the second plurality of labeled training inputs. 5 . The method of claim 1 , wherein the teacher objective function also includes a semi-supervised learning term that measures, for a second plurality of unlabeled training inputs, a performance of the teacher neural network in accordance with the current values of the teacher parameters on a semi-supervised learning task as measured on the second plurality of unlabeled training inputs. 6 . The method of claim 5 , wherein the first plurality of unlabeled training inputs are the same as the second plurality of unlabeled training inputs. 7 . The method of claim 1 , wherein, for each of the unlabeled training inputs, the respective pseudo-label for the unlabeled training input is the same as the respective teacher output for the unlabeled training input. 8 . The method of claim 7 , wherein training the teacher neural network to determine updated values of the teacher parameters comprises computing, through backpropagation, a gradient of the teacher objective function with respect to the teacher parameters. 9 . The method of claim 1 , wherein each teacher and student output for the machine learning task specifies a respective probability distribution over a plurality of classes and wherein generating, for each of the unlabeled training inputs, the respective pseudo-label for the unlabeled training input comprises: selecting one of the classes using the probability distribution specified by the teacher output; and generating a pseudo-label that identifies the sampled class as the ground-truth output for the unlabeled training input. 10 . The method of claim 9 , wherein selecting one of the classes using the probability distribution specified by the teacher output comprises: sampling one of the classes from the probability distribution specified by the teacher output 11 . The method of claim 9 , wherein training the teacher neural network to determine updated values of the teacher parameters comprises computing an approximate gradient of the first term of the teacher objective function with respect to the teacher parameters. 12 . The method of claim 11 , wherein computing an approximate gradient of the first term of the teacher objective function comprises: computing a first student gradient with respect to the student parameters of the student objective function evaluated at the current values of the student parameters and for the first plurality of unlabeled training inputs; computing a second student gradient with respect to the student parameters of the first term of the teacher objective function evaluated at the updated values of the student parameters and for the first plurality of labeled training inputs; computing a teacher gradient with respect to the teacher parameters of a second objective that measures, for each of the first plurality of unlabeled training inputs, an error between (i) the respective pseudo-label for the unlabeled training input and (ii) the respective teacher output for the unlabeled training input generated by the teacher neural network in accordance with the current values of the teacher parameters; and computing the approximation from the first student gradient, the second student gradient, and the teacher gradient. 13 . The method of claim 12 , wherein computing the approximation from the first student gradient, the second student gradient, and the teacher gradient comprises: determining a feedback coefficient from the first and second student gradients; and multiplying the teacher gradient by the feedback coefficient. 14 . The method of claim 1 , further comprising: after the joint training, further training the student neural network on a third plurality of labeled training inputs through supervised learning. 15 . The method of claim 1 , further comprising: before the joint training, training the teacher neural network on a fourth plurality of labeled training inputs through supervised learning. 16 . One or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations for training a student neural network having a plurality of student parameters to perform a machine learning task, the operations comprising: training the student neural network jointly with a teacher neural network, wherein the teacher neural network has a plurality of teacher parameters, the joint training comprising

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Validation; Performance evaluation; Active pattern learning techniques · CPC title

  • characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

  • Non-supervised learning, e.g. competitive learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022188636A1 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network using meta pseudo-labels. One of the methods includes training a student neural network using pseudo-labels generated by a teacher neural network that is being trained jointly with the student neural network.
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06V10/774. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 16 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).