Multimodal Image Classifier using Textual and Visual Embeddings
US-2021264203-A1 · Aug 26, 2021 · US
US11587139B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11587139-B2 |
| Application number | US-202016779545-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 31, 2020 |
| Priority date | Jan 31, 2020 |
| Publication date | Feb 21, 2023 |
| Grant date | Feb 21, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system including one or more processors and one or more non-transitory computer-readable media storing computing instructions configured to run on the one or more processors and perform receiving from an item catalog database a respective item description and respective attribute values for each item of a set of items; generating text embeddings using a text embedding model to represent the respective item description and the respective attribute values; generating a graph of the set of items from the item catalog database connected by a set of edges; training the text embedding model and a machine learning model using a neural loss function based on the graph; and automatically determining, based on the machine learning model, as trained, a gender label for each first item in which the gender classification is unlabeled and in which a respective quantity of respective attribute values for the each first item is at least a predetermined threshold. Other embodiments are disclosed.
Opening claim text (preview).
What is claimed: 1. A system comprising: one or more processors; and one or more non-transitory computer-readable media storing computing instructions configured to run on the one or more processors and perform: receiving from an item catalog database a respective item description and respective attribute values for each item of a set of items, wherein a gender classification of the respective attribute values for the each item of the set of items is either labeled or unlabeled; generating text embeddings using a text embedding model to represent the respective item description and the respective attribute values for the each item of the set of items; generating a graph of the set of items from the item catalog database connected by a set of edges, wherein each pair of items of the set of items that is connected by a respective edge of the set of edges in the graph has been viewed together in one or more respective sessions, the respective edge comprises a weight comprising a co-view count, and the set of edges comprises (a) one or more unlabeled-unlabeled edges, (b) one or more labeled-unlabeled edges, and (c) one or more labeled-labeled edges; training the text embedding model and a machine learning model using a neural loss function based on the graph; and automatically determining, based on the machine learning model, as trained, a gender label for each first item of the set of items in which the gender classification is unlabeled and in which a respective quantity of respective attribute values for the each first item is at least a predetermined threshold. 2. The system of claim 1 , wherein the computing instructions are further configured to perform: determining, based on an image embedding model, as trained, a gender label for each second item of the set of items that does not meet the predetermined threshold. 3. The system of claim 1 , wherein the predetermined threshold is 5. 4. The system of claim 1 , wherein the computing instructions are further configured to perform: transforming an image into a second vector representing the image using a residual neural network (“ResNet”). 5. The system of claim 1 , wherein the computing instructions are further configured to perform: training an image embedding model based on images of items from the item catalog database using loss equations to minimize a distance between text representations and image representations for the items. 6. The system of claim 5 , wherein the images depict items of clothing from the item catalog database. 7. The system of claim 1 , wherein: the text embedding model is a Bidirectional Encoder Representations from Transformers (“BERT”); and an output from the text embedding model comprises a vector representation. 8. The system of claim 1 , wherein training the text embedding model and the machine learning model using the neural loss function based on the graph further comprises: training the machine learning model with the neural loss function based on first distances between first text embeddings for first pairs of nodes connected by the one or more labeled-labeled edges, second distances between second text embeddings for second pairs of nodes connected by the one or more labeled-unlabeled edges, third distances between third text embeddings for third pairs of nodes connected by the one or more unlabeled-unlabeled edges, and a softmax loss cost function for fourth text embeddings of nodes of the graph that are labeled. 9. The system of claim 1 , wherein the gender classification, when labeled, comprises one of: a male gender label; a female gender label; or a unisex gender label. 10. The system of claim 1 , wherein the computing instructions are further configured to perform: receiving a selection of an anchor item from a user, the anchor item comprising a first gender label; determining one or more recommended items that match the first gender label based on the gender labels determined by the machine learning model; and sending instructions to display the one or more recommended items to the user. 11. A method being implemented via execution of computing instructions configured to run at one or more processors and stored at one or more non-transitory computer-readable media, the method comprising: receiving from an item catalog database a respective item description and respective attribute values for each item of a set of items, wherein a gender classification of the respective attribute values for the each item of the set of items is either labeled or unlabeled; generating text embeddings using a text embedding model to represent the respective item description and the respective attribute values for the each item of the set of items; generating a graph of the set of items from the item catalog database connected by a set of edges, wherein each pair of items of the set of items that is connected by a respective edge of the set of edges in the graph has been viewed together in one or more respective sessions, the respective edge comprises a weight comprising a co-view count, and the set of edges comprises (a) one or more unlabeled-unlabeled edges, (b) one or more labeled-unlabeled edges, and (c) one or more labeled-labeled edges; training the text embedding model and a machine learning model using a neural loss function based on the graph; and automatically determining, based on the machine learning model, as trained, a gender label for each first item of the set of items in which the gender classification is unlabeled and in which a respective quantity of respective attribute values for the each first item is at least a predetermined threshold. 12. The method of claim 11 , further comprising: determining, based on an image embedding model, as trained, a gender label for each second item of the set of items that does not meet the predetermined threshold. 13. The method of claim 11 , wherein the predetermined threshold is 5. 14. The method of claim 11 , further comprising: transforming an image into a second vector representing the image using a residual neural network (“ResNet”). 15. The method of claim 11 , further comprising: training an image embedding model based on images of items from the item catalog database using loss equations to minimize a distance between text representations and image representations for the items. 16. The method of claim 15 , wherein the images depict items of clothing from the item catalog database. 17. The method of claim 11 , wherein: the text embedding model is a Bidirectional Encoder Representations from Transformers (“BERT”); and an output from the text embedding model comprises a vector representation. 18. The method of claim 11 , wherein training the text embedding model and the machine learning model using the neural loss function based on the graph further comprises: training the machine learning model with the neural loss function based on first distances between first text embeddings for first pairs of nodes connected by the one or more labeled-labeled edges, second distances between second text embeddings for second pairs of nodes connected by the one or more labeled-unlabeled edges, third distances between third text embeddings for third pairs of nodes connected by the one or more unlabeled-unlabeled edges, and a softmax loss cost function for fourth text embeddings of nodes of the graph that are labeled. 19. The method of claim 11 , wherein the gender classification, when labeled, comprises one of: a male gender label; a female gender label; or a unisex gende
Supervised learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
Learning methods · CPC title
Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.