Who is the assignee on this patent?

Home Depot Product Authority Llc

What technology area does this patent fall under?

Primary CPC classification G06N20/20. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 27 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Optimizing training data for image classification

US11687841B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11687841-B2
Application number	US-202016894344-A
Country	US
Kind code	B2
Filing date	Jun 5, 2020
Priority date	Jun 6, 2019
Publication date	Jun 27, 2023
Grant date	Jun 27, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for machine learning-based classification may include training a machine learning model with a full training data set, the full training data set comprising a plurality of data points, to generate a first model state of the machine learning model, generating respective embeddings for the data points in the full training data set with the first model state of the machine learning model, applying a clustering algorithm to the respective embeddings to generate one or more clusters of the embeddings, identifying outlier embeddings from the one or more clusters of the embeddings, generating a reduced training data set comprising the full training data set less the data points associated with the outlier embeddings, training the machine learning model with the reduced training data set to a second model state, and applying the second model state to one or more data sets to classify the one or more data sets.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for machine learning-based classification, the method comprising: training a machine learning model with a full training data set, the full training data set comprising a plurality of data points, to generate a first model state of the machine learning model; generating respective embeddings for the data points in the full training data set with the first model state of the machine learning model; applying a clustering algorithm to the respective embeddings to generate a plurality of clusters of the embeddings; identifying outlier embeddings from the plurality of clusters of the embeddings; generating a reduced training data set comprising the full training data set less the data points associated with the outlier embeddings, the reduced training data set including the data points associated with embeddings in the plurality of clusters; training the machine learning model with the reduced training data set to a second model state; and applying the second model state to one or more data sets to classify the one or more data sets. 2. The method of claim 1 , wherein applying the second model state to classify one or more data sets comprises applying the second model state to classify one or more images. 3. The method of claim 1 , further comprising: applying a distance learning algorithm to the respective embeddings to create a distanced embeddings set; wherein applying a clustering algorithm to the respective embeddings comprises applying the clustering algorithm to the distanced embeddings set. 4. The method of claim 1 , wherein identifying outlier embeddings from the plurality of clusters of the embeddings comprises: designating embeddings that are remote from all of the plurality of clusters as outlier embeddings. 5. The method of claim 1 , wherein identifying outlier embeddings from the plurality of clusters of the embeddings comprises: designating embeddings that are remote from a single cluster of embeddings as outlier embeddings. 6. The method of claim 1 , wherein identifying outlier embeddings from the plurality of clusters of the embeddings comprises: determining a respective category associated with each of the embeddings; determining a respective category associated with each cluster of embeddings; and designating embeddings that are remote from a cluster of embeddings associated with the category with which the embeddings are associated as outlier embeddings. 7. The method of claim 1 , wherein identifying outlier embeddings from the plurality of clusters of the embeddings comprises: identifying at least a predetermined percentage of embeddings as outlier embeddings; identifying at least a predetermined quantity of embeddings as outlier embeddings; or identifying embeddings that are a predetermined distance from one of the plurality of clusters as outlier embeddings. 8. The method of claim 1 , wherein training the machine learning model with the reduced training data set comprises training the first model state of the machine learning model with the reduced training data set. 9. A system for machine learning-based classification, the system comprising: a processor; and a non-transitory, computer-readable memory storing instructions that, when executed by the processor, cause the processor to: obtain training data comprising a full training data set; train a machine learning model with the full training data set to a first model state; generate respective embeddings for the data points in the full training data set with the first model state of the machine learning model; apply a clustering algorithm to the respective embeddings to generate a plurality of clusters of the embeddings; identify outlier embeddings from the plurality of clusters of the embeddings; generate a reduced training data set comprising the full training data set less the data points associated with the outlier embeddings, the reduced training data set including the data points associated with embeddings in the plurality of clusters; train the machine learning model with the reduced training data set to a second model state; and apply the second model state to one or more data sets to classify the one or more data sets. 10. The system of claim 9 , wherein applying the second model state to classify one or more data sets comprises applying the second model state to classify one or more images. 11. The system of claim 9 , wherein the memory stores further instructions that, when executed by the processor, cause the processor to: apply a distance learning algorithm to the respective embeddings to create a distanced embeddings set; wherein applying a clustering algorithm to the respective embeddings comprises applying the clustering algorithm to the distanced embeddings set. 12. The system of claim 9 , wherein identifying outlier embeddings from the plurality of clusters of the embeddings comprises: designating embeddings that are remote from all of the plurality of clusters as outlier embeddings. 13. The system of claim 9 , wherein identifying outlier embeddings from the plurality of clusters of the embeddings comprises: designating embeddings that are remote from a single cluster of embeddings as outlier embeddings. 14. The system of claim 9 , wherein identifying outlier embeddings from the plurality of clusters of the embeddings comprises: determining a respective category associated with each of the embeddings; determining a respective category associated with each cluster of embeddings; and designating embeddings that are remote from a cluster of embeddings associated with the category with which the embeddings are associated as outlier embeddings. 15. The system of claim 9 , wherein identifying outlier embeddings from the plurality of clusters of the embeddings comprises: identifying at least a predetermined percentage of embeddings as outlier embeddings; identifying at least a predetermined quantity of embeddings as outlier embeddings; or identifying embeddings that are a predetermined distance from one of the plurality of clusters as outlier embeddings. 16. The system of claim 9 , wherein training the machine learning model with the reduced training data set comprises training the first model state of the machine learning model with the reduced training data set. 17. A machine learning-based method of classifying a plurality of images, the method comprising: training a machine learning model with a full training data set, the full training data set comprising a plurality of paired images and classes, to generate a first model state of the machine learning model; generating respective embeddings for the images in the full training data set with the first model state of the machine learning model; applying a clustering algorithm to the respective embeddings to generate a plurality of clusters of the embeddings; identifying outlier embeddings from the plurality of clusters of the embeddings; generating a reduced training data set comprising the full training data set less the images associated with the outlier embeddings, the reduced training data set including the images associated with the embeddings in the plurality of clusters; training the machine learning model with the reduced training data set to a second model state; and applying the second model state to one or more unclassified images to classify the one or more unclassified images. 18. The method of claim 17 , wherein training the machine learning model with the reduced training data set comprises training the firs

Assignees

Home Depot Product Authority Llc

Inventors

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/09
Supervised learning · CPC title
G06V10/764
using classification, e.g. of video objects · CPC title
G06N20/20Primary
Ensemble learning · CPC title
G06V10/763
Non-hierarchical techniques, e.g. based on statistics of modelling distributions · CPC title

Patent family

Related publications grouped by family.

View patent family 73651632

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11687841B2 cover?: A method for machine learning-based classification may include training a machine learning model with a full training data set, the full training data set comprising a plurality of data points, to generate a first model state of the machine learning model, generating respective embeddings for the data points in the full training data set with the first model state of the machine learning model,…
Who is the assignee on this patent?: Home Depot Product Authority Llc
What technology area does this patent fall under?: Primary CPC classification G06N20/20. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 27 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Image based content search and recommendations

Semi-supervised hybrid clustering/classification system

Predicting immunotherapy response in non-small cell lung cancer with serial quantitative vessel tortuosity

Method of training neural network, and recognition method and apparatus using neural network

Cluster-trained machine learning for image processing

Frequently asked questions