Unification of models having respective target classes with distillation

US2021034985A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021034985-A1
Application numberUS-202017046014-A
CountryUS
Kind codeA1
Filing dateFeb 25, 2020
Priority dateMar 22, 2019
Publication dateFeb 4, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Generating soft labels used for training a unified model is achieved by unification of models having respective target classes with distillation. A collection of samples is prepared. Predictions are generated by individual trained models. Individual trained models have an individual class set to form a unified class set that includes target classes. The unified soft labels are estimated for each sample over the target classes in the unified class set from the predictions using a relation connecting a first output of each individual trained model and a second output of the unified model. The unified soft labels are output to train a unified model having the unified class set.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for generating unified soft labels, the computer-implemented method comprising: preparing a collection of samples; obtaining, for each sample, a plurality of predictions generated by a plurality of individual trained models, each individual trained model having an individual class set to form at least partially a unified class set including a plurality of target classes; estimating, for each sample, unified soft labels over the target classes in the unified class set from the plurality of the predictions using a relation connecting a first output of each individual trained model and a second output of the unified model; and outputting the unified soft labels to train a unified model having the unified class set. 2 . The computer-implemented method of claim 1 , wherein the computer-implemented method comprises: feeding each sample into the unified model to infer predicted values over the target classes in the unified class set as the second output; updating the unified model based on a loss function between the unified soft labels and the predicted values for each sample; and storing the unified model updated. 3 . The computer-implemented method of claim 1 , wherein the relation indicates equivalence between each predicted value in the first output of one individual trained model and a corresponding predicted value in the second output of the unified model normalized by the individual class set of the one individual trained model. 4 . The computer-implemented method of claim 2 , wherein the loss function is weighted by weightings over the target classes in the unified class set, each weighting for one target class being computed in a manner based on a statistic of the unified soft labels on the one target class through the collection. 5 . The computer-implemented method of claim 1 , wherein the unified soft labels are estimated by solving a problem of optimizing an objective function with respect to a distribution q corresponding to the second output of the unified model, the objective function measuring an error between a plurality of reference distributions p i corresponding to the plurality of the predictions and a plurality of normalized distributions q î each obtained by normalizing the distribution q over target classes in each individual class set. 6 . The computer-implemented method of claim 5 , wherein the distribution q is obtained by: solving a convex problem with temporary variables u l each given to each target class l in the unified class set L U , the distribution q being represented by a set of exponential functions of respective temporary variables u l in the convex problem; and transforming solved temporary variables u l into the distribution q. 7 . The computer-implemented method of claim 5 , wherein the objective function is a cross-entropy function. 8 . The computer-implemented method of claim 1 , wherein the unified soft labels are estimated by solving a problem of optimizing an objective function with respect to at least an output vector u representing the second output of the unified model as variables in a manner based on matrix factorization, the unified soft labels being represented in a form of probability or logits. 9 . The computer-implemented method of claim 8 , wherein the output vector u is represented in a form of probability and the problem of optimizing the objective function is solved further with respect to a normalization vector v representing normalization factors for the individual trained models, the objective function measuring an error between a probability matrix P representing the plurality of the predictions p i in a form of probability with missing entries and a product of the output vector u and the normalization vector v, with a mask matrix M representing an existence of a missing class in the individual class sets. 10 . The computer-implemented method of claim 8 , wherein the output vector u is represented in a form of logits and the problem of optimizing the objective function is solved further with respect to a logit scaling vector v and a logit shift vector c, the objective function measuring an error between a logit matrix Z representing the plurality of the predictions p i in a form of logits with missing entries and a product of the output vector u and the logit scaling vector v shifted by the logit shift vector c, with a mask matrix M representing an existence of missing class in the individual class sets. 11 . The computer-implemented method of claim 8 , wherein the output vector u is represented in a form of logits and the problem of optimizing the objective function is solved further with respect to a logit shift vector c, the objective function measuring an error between a logit matrix Z representing the plurality of the predictions p i in a form of logits with missing entries and a product of the output vector u and the fixed scaling vector v shifted by the logit shift vector c, with a mask matrix M representing an existence of missing class in the individual class sets. 12 . The computer-implemented method of claim 1 , wherein each of the unified model and the individual trained models is selected from a group consisting of neural network based classification models, decision tree or forest based classification models, and support vector machine based classification models. 13 . The computer-implemented method of claim 1 , wherein the computer-implemented method comprises: receiving, for each of the plurality of the individual trained models, (i) a content of individual trained model itself or (ii) a soft label collection obtained by feeding each sample into the individual trained model, together with a content of the individual class set. 14 . A computer-implemented method for training a unified model, the computer-implemented method comprising: preparing a collection of samples; obtaining, for each sample, a plurality of predictions generated by a plurality of individual trained models, each individual trained model having an individual class set to form at least partially a unified class set including the plurality of target classes; and updating the unified model having the unified class set using the plurality of the predictions for each sample and a relation connecting a first output of each individual trained model and a second output of the unified model over the target classes in the unified class set. 15 . The computer-implemented method of claim 14 , wherein the computer-implemented method comprises: feeding each sample into the unified model to infer predicted values over the target classes in the unified class set as the second output, the unified model being updated using the predicted values; and storing the unified model updated. 16 . The computer-implemented method of claim 15 , wherein the unified model includes a neural network and the unified model is updated by back-propagating a loss throughout the neural network, the loss measuring an error between a plurality of reference distributions p i corresponding to the plurality of the predictions and a plurality of normalized distributions q î obtained by normalizing a distribution q over target classes in each individual class set, the distribution q being obtained as the predicted values inferred by the unified model. 17 . The computer-implemented method of claim 15 , wherein the unified model includes a neural network and the unified model is updated by back-propagating a loss throughout the neural network, the loss being obtained by solving a

Assignees

Inventors

Classifications

  • Incorporation of unlabelled data, e.g. multiple instance learning [MIL] · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

  • G06F18/00Primary

    Pattern recognition · CPC title

  • of results relating to different input data, e.g. multimodal recognition · CPC title

  • based on naturality criteria, e.g. with non-negative factorisation or negative correlation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021034985A1 cover?
Generating soft labels used for training a unified model is achieved by unification of models having respective target classes with distillation. A collection of samples is prepared. Predictions are generated by individual trained models. Individual trained models have an individual class set to form a unified class set that includes target classes. The unified soft labels are estimated for eac…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06V10/7753. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Feb 04 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).