Sequential ensemble model training for open sets

US11526693B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11526693-B1
Application numberUS-202016865167-A
CountryUS
Kind codeB1
Filing dateMay 1, 2020
Priority dateMay 1, 2020
Publication dateDec 13, 2022
Grant dateDec 13, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed are systems and method for training an ensemble of machine learning models with a focus on feature engineering. For example, the training of the models encourages each machine learning model of the ensemble to rely on a different set of input features from the training data samples used to train the machine learning models of the ensemble. However, instead of telling each model explicitly which features to learn, in accordance with the disclosed implementations, ML models of the ensemble may be trained sequentially, with each new model trained to disregard input features learned by previously trained ML models of the ensemble and learn based on other features included in the training data samples.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method to train each of a plurality of machine learning models of an ensemble of machine learning models, comprising: training a first machine learning model of the plurality of machine learning models using an initial loss function and training data that includes a plurality of training images to produce a first trained machine learning model; determining, based at last in part on the training, a plurality of distilled images corresponding to a first plurality of features of the training images learned by the first machine learning model; generating, based at least in part on the distilled images, a feature-based diversification component that, when used to train a second machine learning model of the plurality of machine learning models of the ensemble, causes the second machine learning model to be agnostic to the first plurality of features of the training images learned by the first machine learning model; generating an updated loss function that includes the feature-based diversification component; and training the second machine learning model of the plurality of machine learning models using the updated loss function and training data that includes the plurality of training images and the plurality of distilled images to produce a second trained machine learning model that is trained to be agnostic to the first plurality of features learned by the first machine learning model. 2. The computer-implemented method of claim 1 , wherein generating the feature-based diversification component includes: obtaining a first embedding vector from the first machine learning model for a training image of the plurality of training images; and generating a distilled image of the plurality of distilled images by iteratively modifying the distilled image to shorten a distance between a second embedding vector of the distilled image and the first embedding vector. 3. The computer-implemented method of claim 1 , wherein the training data includes: a first plurality of in-distribution training images, each of the first plurality of in-distribution training images corresponding to a class of a plurality of classes. 4. The computer-implemented method of claim 1 , further comprising: providing a first image corresponding to a first class of a plurality of classes to the ensemble that includes the first trained machine learning model and the second trained machine learning model; and receiving, from the ensemble, an ensemble result that indicates that the first image corresponds to the first class of the plurality of classes. 5. The computer-implemented method of claim 1 , further comprising: providing a first image that does not correspond to any class of a plurality of classes to the ensemble that includes the first trained machine learning model and the second trained machine learning model; and receiving, from the ensemble, an ensemble result that indicates that the first image does not correspond to any class of the plurality of classes. 6. A computing system, comprising: one or more processors; and a memory storing program instructions that when executed by the one or more processors, cause the one or more processors to at least: train a first machine learning model of a plurality of machine learning models of an ensemble using an initial loss function and training data that includes a plurality of training data samples to produce a first trained machine learning model; determine, for each of at least some of the training data samples, a plurality of distilled data samples corresponding to a first plurality of features of the at least some of the training data samples learned by the first machine learning model; generate a feature-based diversification component that, when used to train a second machine learning model of the plurality of machine learning models of the ensemble, causes the second machine learning model to be agnostic to the first plurality of features learned by the first machine learning model; and train the second machine learning model of the plurality of machine learning models of the ensemble using training data that includes the plurality of training data samples and the plurality of distilled data samples to produce a second trained machine learning model that is trained to be agnostic to the first plurality of features learned by the first machine learning model. 7. The computing system of claim 6 , wherein: the program instructions that, when executed by the one or more processors to generate the feature-based diversification component, further cause the one or more processors to at least generate the feature-based diversification component based at least in part on the distilled data samples; and wherein the program instructions that, when executed by the one or more processors to cause the processors to train the second machine learning model further include instructions that, when executed by the one or more processors, further cause the one or more processors to at least train the second machine learning model of the plurality of machine learning models using an updated loss function that includes the feature-based diversification component and training data that includes the plurality of training data samples and the plurality of distilled data samples. 8. The computing system of claim 6 , wherein the program instructions that, when executed by the one or more processors, further cause the one or more processors to at least: subsequent to training the second machine learning model: determine, for each of at least some of the training data samples, a second plurality of distilled data samples corresponding to a second plurality of features of the at least some of the training data samples learned by the second machine learning model, wherein: the second plurality of features are different than the first plurality of features; and the second plurality of distilled data samples are different than the plurality of distilled data samples; and train a third machine learning model of the plurality of machine learning models using training data that includes the plurality of training data samples, the plurality of distilled data samples, and the second plurality of distilled data samples to produce a third trained machine learning model that is trained to be agnostic to the first plurality of features and the second plurality of features. 9. The computing system of claim 6 , wherein a second loss function that is different than the initial loss function is used in training the second machine learning model. 10. The computing system of claim 9 , wherein the second loss function includes a cross-entropy loss and the feature-based diversification component is determined based at least in part on the distilled data samples. 11. The computing system of claim 6 , wherein the program instructions that, when executed by the one or more processors, further cause the one or more processors to at least: receive an input data sample to the ensemble; determine, with the first trained machine learning model, for each class of a plurality of classes, a first probability that the input data sample corresponds with the class; determine, with the second trained machine learning model, for each class of the plurality of classes, a second probability that the input data sample corresponds with the class; determine, based at least in part on the first probabilities and the second probabilities, that the input data sample corresponds to a first class of the plurality of classes; and produce an ensemble result indicating that the input data sample corresponds to the first class. 12. The computing system

Assignees

Inventors

Classifications

  • Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection · CPC title

  • Multiple classes · CPC title

  • characterised by the process organisation or structure, e.g. boosting cascade · CPC title

  • Ensemble learning · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11526693B1 cover?
Disclosed are systems and method for training an ensemble of machine learning models with a focus on feature engineering. For example, the training of the models encourages each machine learning model of the ensemble to rely on a different set of input features from the training data samples used to train the machine learning models of the ensemble. However, instead of telling each model explic…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F18/2431. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 13 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).