Learning efficient object detection models with knowledge distillation
US-2018268292-A1 · Sep 20, 2018 · US
US11526693B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-11526693-B1 |
| Application number | US-202016865167-A |
| Country | US |
| Kind code | B1 |
| Filing date | May 1, 2020 |
| Priority date | May 1, 2020 |
| Publication date | Dec 13, 2022 |
| Grant date | Dec 13, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed are systems and method for training an ensemble of machine learning models with a focus on feature engineering. For example, the training of the models encourages each machine learning model of the ensemble to rely on a different set of input features from the training data samples used to train the machine learning models of the ensemble. However, instead of telling each model explicitly which features to learn, in accordance with the disclosed implementations, ML models of the ensemble may be trained sequentially, with each new model trained to disregard input features learned by previously trained ML models of the ensemble and learn based on other features included in the training data samples.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method to train each of a plurality of machine learning models of an ensemble of machine learning models, comprising: training a first machine learning model of the plurality of machine learning models using an initial loss function and training data that includes a plurality of training images to produce a first trained machine learning model; determining, based at last in part on the training, a plurality of distilled images corresponding to a first plurality of features of the training images learned by the first machine learning model; generating, based at least in part on the distilled images, a feature-based diversification component that, when used to train a second machine learning model of the plurality of machine learning models of the ensemble, causes the second machine learning model to be agnostic to the first plurality of features of the training images learned by the first machine learning model; generating an updated loss function that includes the feature-based diversification component; and training the second machine learning model of the plurality of machine learning models using the updated loss function and training data that includes the plurality of training images and the plurality of distilled images to produce a second trained machine learning model that is trained to be agnostic to the first plurality of features learned by the first machine learning model. 2. The computer-implemented method of claim 1 , wherein generating the feature-based diversification component includes: obtaining a first embedding vector from the first machine learning model for a training image of the plurality of training images; and generating a distilled image of the plurality of distilled images by iteratively modifying the distilled image to shorten a distance between a second embedding vector of the distilled image and the first embedding vector. 3. The computer-implemented method of claim 1 , wherein the training data includes: a first plurality of in-distribution training images, each of the first plurality of in-distribution training images corresponding to a class of a plurality of classes. 4. The computer-implemented method of claim 1 , further comprising: providing a first image corresponding to a first class of a plurality of classes to the ensemble that includes the first trained machine learning model and the second trained machine learning model; and receiving, from the ensemble, an ensemble result that indicates that the first image corresponds to the first class of the plurality of classes. 5. The computer-implemented method of claim 1 , further comprising: providing a first image that does not correspond to any class of a plurality of classes to the ensemble that includes the first trained machine learning model and the second trained machine learning model; and receiving, from the ensemble, an ensemble result that indicates that the first image does not correspond to any class of the plurality of classes. 6. A computing system, comprising: one or more processors; and a memory storing program instructions that when executed by the one or more processors, cause the one or more processors to at least: train a first machine learning model of a plurality of machine learning models of an ensemble using an initial loss function and training data that includes a plurality of training data samples to produce a first trained machine learning model; determine, for each of at least some of the training data samples, a plurality of distilled data samples corresponding to a first plurality of features of the at least some of the training data samples learned by the first machine learning model; generate a feature-based diversification component that, when used to train a second machine learning model of the plurality of machine learning models of the ensemble, causes the second machine learning model to be agnostic to the first plurality of features learned by the first machine learning model; and train the second machine learning model of the plurality of machine learning models of the ensemble using training data that includes the plurality of training data samples and the plurality of distilled data samples to produce a second trained machine learning model that is trained to be agnostic to the first plurality of features learned by the first machine learning model. 7. The computing system of claim 6 , wherein: the program instructions that, when executed by the one or more processors to generate the feature-based diversification component, further cause the one or more processors to at least generate the feature-based diversification component based at least in part on the distilled data samples; and wherein the program instructions that, when executed by the one or more processors to cause the processors to train the second machine learning model further include instructions that, when executed by the one or more processors, further cause the one or more processors to at least train the second machine learning model of the plurality of machine learning models using an updated loss function that includes the feature-based diversification component and training data that includes the plurality of training data samples and the plurality of distilled data samples. 8. The computing system of claim 6 , wherein the program instructions that, when executed by the one or more processors, further cause the one or more processors to at least: subsequent to training the second machine learning model: determine, for each of at least some of the training data samples, a second plurality of distilled data samples corresponding to a second plurality of features of the at least some of the training data samples learned by the second machine learning model, wherein: the second plurality of features are different than the first plurality of features; and the second plurality of distilled data samples are different than the plurality of distilled data samples; and train a third machine learning model of the plurality of machine learning models using training data that includes the plurality of training data samples, the plurality of distilled data samples, and the second plurality of distilled data samples to produce a third trained machine learning model that is trained to be agnostic to the first plurality of features and the second plurality of features. 9. The computing system of claim 6 , wherein a second loss function that is different than the initial loss function is used in training the second machine learning model. 10. The computing system of claim 9 , wherein the second loss function includes a cross-entropy loss and the feature-based diversification component is determined based at least in part on the distilled data samples. 11. The computing system of claim 6 , wherein the program instructions that, when executed by the one or more processors, further cause the one or more processors to at least: receive an input data sample to the ensemble; determine, with the first trained machine learning model, for each class of a plurality of classes, a first probability that the input data sample corresponds with the class; determine, with the second trained machine learning model, for each class of the plurality of classes, a second probability that the input data sample corresponds with the class; determine, based at least in part on the first probabilities and the second probabilities, that the input data sample corresponds to a first class of the plurality of classes; and produce an ensemble result indicating that the input data sample corresponds to the first class. 12. The computing system
Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection · CPC title
Multiple classes · CPC title
characterised by the process organisation or structure, e.g. boosting cascade · CPC title
Ensemble learning · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.