Method and apparatus for generating training data to train student model using teacher model
US-2019034764-A1 · Jan 31, 2019 · US
US2021034985A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2021034985-A1 |
| Application number | US-202017046014-A |
| Country | US |
| Kind code | A1 |
| Filing date | Feb 25, 2020 |
| Priority date | Mar 22, 2019 |
| Publication date | Feb 4, 2021 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Generating soft labels used for training a unified model is achieved by unification of models having respective target classes with distillation. A collection of samples is prepared. Predictions are generated by individual trained models. Individual trained models have an individual class set to form a unified class set that includes target classes. The unified soft labels are estimated for each sample over the target classes in the unified class set from the predictions using a relation connecting a first output of each individual trained model and a second output of the unified model. The unified soft labels are output to train a unified model having the unified class set.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method for generating unified soft labels, the computer-implemented method comprising: preparing a collection of samples; obtaining, for each sample, a plurality of predictions generated by a plurality of individual trained models, each individual trained model having an individual class set to form at least partially a unified class set including a plurality of target classes; estimating, for each sample, unified soft labels over the target classes in the unified class set from the plurality of the predictions using a relation connecting a first output of each individual trained model and a second output of the unified model; and outputting the unified soft labels to train a unified model having the unified class set. 2 . The computer-implemented method of claim 1 , wherein the computer-implemented method comprises: feeding each sample into the unified model to infer predicted values over the target classes in the unified class set as the second output; updating the unified model based on a loss function between the unified soft labels and the predicted values for each sample; and storing the unified model updated. 3 . The computer-implemented method of claim 1 , wherein the relation indicates equivalence between each predicted value in the first output of one individual trained model and a corresponding predicted value in the second output of the unified model normalized by the individual class set of the one individual trained model. 4 . The computer-implemented method of claim 2 , wherein the loss function is weighted by weightings over the target classes in the unified class set, each weighting for one target class being computed in a manner based on a statistic of the unified soft labels on the one target class through the collection. 5 . The computer-implemented method of claim 1 , wherein the unified soft labels are estimated by solving a problem of optimizing an objective function with respect to a distribution q corresponding to the second output of the unified model, the objective function measuring an error between a plurality of reference distributions p i corresponding to the plurality of the predictions and a plurality of normalized distributions q î each obtained by normalizing the distribution q over target classes in each individual class set. 6 . The computer-implemented method of claim 5 , wherein the distribution q is obtained by: solving a convex problem with temporary variables u l each given to each target class l in the unified class set L U , the distribution q being represented by a set of exponential functions of respective temporary variables u l in the convex problem; and transforming solved temporary variables u l into the distribution q. 7 . The computer-implemented method of claim 5 , wherein the objective function is a cross-entropy function. 8 . The computer-implemented method of claim 1 , wherein the unified soft labels are estimated by solving a problem of optimizing an objective function with respect to at least an output vector u representing the second output of the unified model as variables in a manner based on matrix factorization, the unified soft labels being represented in a form of probability or logits. 9 . The computer-implemented method of claim 8 , wherein the output vector u is represented in a form of probability and the problem of optimizing the objective function is solved further with respect to a normalization vector v representing normalization factors for the individual trained models, the objective function measuring an error between a probability matrix P representing the plurality of the predictions p i in a form of probability with missing entries and a product of the output vector u and the normalization vector v, with a mask matrix M representing an existence of a missing class in the individual class sets. 10 . The computer-implemented method of claim 8 , wherein the output vector u is represented in a form of logits and the problem of optimizing the objective function is solved further with respect to a logit scaling vector v and a logit shift vector c, the objective function measuring an error between a logit matrix Z representing the plurality of the predictions p i in a form of logits with missing entries and a product of the output vector u and the logit scaling vector v shifted by the logit shift vector c, with a mask matrix M representing an existence of missing class in the individual class sets. 11 . The computer-implemented method of claim 8 , wherein the output vector u is represented in a form of logits and the problem of optimizing the objective function is solved further with respect to a logit shift vector c, the objective function measuring an error between a logit matrix Z representing the plurality of the predictions p i in a form of logits with missing entries and a product of the output vector u and the fixed scaling vector v shifted by the logit shift vector c, with a mask matrix M representing an existence of missing class in the individual class sets. 12 . The computer-implemented method of claim 1 , wherein each of the unified model and the individual trained models is selected from a group consisting of neural network based classification models, decision tree or forest based classification models, and support vector machine based classification models. 13 . The computer-implemented method of claim 1 , wherein the computer-implemented method comprises: receiving, for each of the plurality of the individual trained models, (i) a content of individual trained model itself or (ii) a soft label collection obtained by feeding each sample into the individual trained model, together with a content of the individual class set. 14 . A computer-implemented method for training a unified model, the computer-implemented method comprising: preparing a collection of samples; obtaining, for each sample, a plurality of predictions generated by a plurality of individual trained models, each individual trained model having an individual class set to form at least partially a unified class set including the plurality of target classes; and updating the unified model having the unified class set using the plurality of the predictions for each sample and a relation connecting a first output of each individual trained model and a second output of the unified model over the target classes in the unified class set. 15 . The computer-implemented method of claim 14 , wherein the computer-implemented method comprises: feeding each sample into the unified model to infer predicted values over the target classes in the unified class set as the second output, the unified model being updated using the predicted values; and storing the unified model updated. 16 . The computer-implemented method of claim 15 , wherein the unified model includes a neural network and the unified model is updated by back-propagating a loss throughout the neural network, the loss measuring an error between a plurality of reference distributions p i corresponding to the plurality of the predictions and a plurality of normalized distributions q î obtained by normalizing a distribution q over target classes in each individual class set, the distribution q being obtained as the predicted values inferred by the unified model. 17 . The computer-implemented method of claim 15 , wherein the unified model includes a neural network and the unified model is updated by back-propagating a loss throughout the neural network, the loss being obtained by solving a
Incorporation of unlabelled data, e.g. multiple instance learning [MIL] · CPC title
Backpropagation, e.g. using gradient descent · CPC title
Pattern recognition · CPC title
of results relating to different input data, e.g. multimodal recognition · CPC title
based on naturality criteria, e.g. with non-negative factorisation or negative correlation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.