Meta pseudo-labels

US12561557B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12561557-B2
Application numberUS-202117551065-A
CountryUS
Kind codeB2
Filing dateDec 14, 2021
Priority dateDec 14, 2020
Publication dateFeb 24, 2026
Grant dateFeb 24, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network using meta pseudo-labels. One of the methods includes training a student neural network using pseudo-labels generated by a teacher neural network that is being trained jointly with the student neural network.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method performed by one or more computers and for training a student neural network having a plurality of student parameters to perform a machine learning task, the method comprising: training the student neural network jointly with a teacher neural network, wherein the teacher neural network has a plurality of teacher parameters, the joint training comprising repeatedly performing the following: obtaining a first plurality of unlabeled training inputs; processing each of the unlabeled training inputs in the first plurality of unlabeled training inputs using the teacher neural network and in accordance with current values of the teacher parameters to generate a respective teacher output for the machine learning task for each of the unlabeled training inputs; generating by the teacher neural network, for each of the unlabeled training inputs, a respective pseudo-label for the unlabeled training input from the respective teacher output for the unlabeled training input; training the student neural network to determine updated values of the student parameters from current values of the student parameters by optimizing a student objective function that measures, for each of the unlabeled training inputs in the first plurality of unlabeled training inputs, an error between (i) a respective student output for the unlabeled training input generated by processing the unlabeled training input using the student neural network in accordance with the current values of the student parameters and (ii) the respective pseudo-label for the unlabeled training input, wherein each teacher and student output for the machine learning task specifies a respective probability distribution over a plurality of classes and wherein generating, for each of the unlabeled training inputs, the respective pseudo-label for the unlabeled training input comprises: selecting one of the classes using the probability distribution specified by the teacher output; and generating a pseudo-label that identifies the selected class as the ground-truth output for the unlabeled training input; obtaining a first plurality of labeled training inputs and, for each labeled training input in the first plurality of labeled training inputs, a respective ground truth output for the machine learning task; and training the teacher neural network to determine updated values of the teacher parameters using student outputs generated by the student neural network for the labeled training inputs, wherein training the teacher neural network comprises optimizing a teacher objective function that includes a first term that measures, for each of the labeled training inputs in the first plurality of labeled training inputs, an error between (i) a respective student output for the labeled training input generated by processing the labeled training input using the student neural network in accordance with the updated values of the student parameters and (ii) the respective ground truth output for the labeled training input, and wherein optimizing the teacher objective comprises an approximate gradient of the first term of the teacher objective function with respect to the teacher parameters. 2 . The method of claim 1 , wherein the total number of teacher parameters of the teacher neural network is greater than the total number of student parameters of the student neural network. 3 . The method of claim 1 , wherein the teacher objective function also includes a supervised learning term that measures, for each labeled training input in a second plurality of labeled training inputs, an error between (i) a respective teacher output for the labeled training input generated by processing the labeled training input using the teacher neural network in accordance with the current values of the teacher parameters and (ii) a respective ground truth output for the labeled training input. 4 . The method of claim 3 , wherein the first plurality of labeled training inputs are the same as the second plurality of labeled training inputs. 5 . The method of claim 1 , wherein the teacher objective function also includes a semi-supervised learning term that measures, for a second plurality of unlabeled training inputs, a performance of the teacher neural network in accordance with the current values of the teacher parameters on a semi-supervised learning task as measured on the second plurality of unlabeled training inputs. 6 . The method of claim 5 , wherein the first plurality of unlabeled training inputs are the same as the second plurality of unlabeled training inputs. 7 . The method of claim 1 , wherein, for each of the unlabeled training inputs, the respective pseudo-label for the unlabeled training input is the same as the respective teacher output for the unlabeled training input. 8 . The method of claim 1 , wherein selecting one of the classes using the probability distribution specified by the teacher output comprises: sampling one of the classes from the probability distribution specified by the teacher output. 9 . The method of claim 1 , wherein computing an approximate gradient of the first term of the teacher objective function comprises: computing a first student gradient with respect to the student parameters of the student objective function evaluated at the current values of the student parameters and for the first plurality of unlabeled training inputs; computing a second student gradient with respect to the student parameters of the first term of the teacher objective function evaluated at the updated values of the student parameters and for the first plurality of labeled training inputs; computing a teacher gradient with respect to the teacher parameters of a second objective that measures, for each of the first plurality of unlabeled training inputs, an error between (i) the respective pseudo-label for the unlabeled training input and (ii) the respective teacher output for the unlabeled training input generated by the teacher neural network in accordance with the current values of the teacher parameters; and computing the approximation from the first student gradient, the second student gradient, and the teacher gradient. 10 . The method of claim 9 , wherein computing the approximation from the first student gradient, the second student gradient, and the teacher gradient comprises: determining a feedback coefficient from the first and second student gradients; and multiplying the teacher gradient by the feedback coefficient. 11 . The method of claim 1 , further comprising: after the joint training, further training the student neural network on a third plurality of labeled training inputs through supervised learning. 12 . The method of claim 1 , further comprising: before the joint training, training the teacher neural network on a fourth plurality of labeled training inputs through supervised learning. 13 . One or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations for training a student neural network having a plurality of student parameters to perform a machine learning task, the operations comprising: training the student neural network jointly with a teacher neural network, wherein the teacher neural network has a plurality of teacher parameters, the joint training comprising repeatedly performing the following: obtaining a first plurality of unlabeled training inputs; processing each of the unlabeled training inputs in the first plurality of unlabeled training inputs using the teacher neural network and in accordance with current values

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling · CPC title

  • Validation; Performance evaluation; Active pattern learning techniques · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12561557B2 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network using meta pseudo-labels. One of the methods includes training a student neural network using pseudo-labels generated by a teacher neural network that is being trained jointly with the student neural network.
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 24 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).