Data augmentation using machine translation capabilities of language models
US-12354011-B2 · Jul 8, 2025 · US
US12592059B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12592059-B2 |
| Application number | US-202217983327-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 8, 2022 |
| Priority date | Nov 8, 2022 |
| Publication date | Mar 31, 2026 |
| Grant date | Mar 31, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system may receive, via a first user interface associated with an online marketplace, a multi-modality request to retrieve a listing for an item, the multi-modality request comprising at least a first image and a first natural language text associated with the item. The system may generate an item embedding based on inputting the first image and the first natural language text to a machine learning model and may generate a first vector associated with the first image and the first natural language text included in the multi-modality request. The system may cause presentation, via the first user interface associated with the online marketplace, of one or more listings for the item retrieved based at least in part on a similarity metric between the first vector and a second vector of a plurality of vectors associated with a plurality of listings.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method, comprising: generating training data comprising training titles and training images, one or more training titles having a masked title portion and one or more training images have a masked image portion, the generating of the training data comprising: masking a portion of a first training image; generating a predicted portion for the first training image based at least in part on masking the portion of the first training image and a title of the first training image; and generating a predicted listing category for the first training image based at least in part on the predicted portion for the first training image; training a machine learning model based on the training data; receiving, via a first user interface associated with an online marketplace, a multi-modality request to retrieve a listing for an item, the multi-modality request comprising at least a first image and a first natural language text associated with the item; generating, by the machine learning model, a first image token based at least in part on the first image and a first title token based at least in part on the first natural language text; generating, by the machine learning model, a first vector based on the first image token and the first title token; and causing presentation, via the first user interface associated with the online marketplace, of one or more listings for the multi-modality request based at least in part on a similarity metric between the first vector and each vector of a plurality of vectors associated with a plurality of listings. 2 . The computer-implemented method of claim 1 , wherein generating the training data further comprises: masking a portion of the title of the first training image; generating a predicted portion for a title of the listing based at least in part on the portion of the title of the first training image and the first training image; and generating a predicted listing category based at least in part on the predicted portion for the title. 3 . The computer-implemented method of claim 1 , further comprising: determining, by the machine learning model, a similarity metric between the first vector and a plurality of vectors associated with a plurality of categories; and associating, by the machine learning model, the first vector with a first category based at least in part on a similarity metric between the first vector and one or more vectors classified by the machine learning model as being associated with the first category. 4 . The computer-implemented method of claim 3 , further comprising: generating, by the machine learning model, a second vector associated with a second image and a second natural language text included in a received multi-modality query; associating, by the machine learning model, the second vector with the first category; and comparing the second vector with a plurality of vectors that include the first vector, wherein the first vector is associated with the first category based at least in part on the first vector and the second vector satisfying a similarity metric. 5 . The computer-implemented method of claim 1 , wherein causing presentation of the one or more listings for the item comprises: causing presentation, via a second user interface associated with the online marketplace, of one or more listings for the item from a first category, wherein the one or more listings that comprise a second image that is different than the first image, a second natural language text that differs from the first natural language text, or both. 6 . The computer-implemented method of claim 1 , further comprising: comparing the portion of the training image with the portion of the training title based at least in part on reconstructing the portion of the training image and reconstructing the portion of the training title; and classifying the listing for the item using the training image and the training title based at least in part on the portion of the training image being associated with the portion of the training title. 7 . The computer-implemented method of claim 1 , further comprising: associating the first vector with a product category based at least in part on the similarity metric between the first vector and vectors classified as being associated with the product category. 8 . An apparatus, comprising: a processor; memory coupled with the processor; and instructions stored in the memory and executable by the processor to cause the apparatus to: generating training data comprising training titles and training images, one or more training titles having a masked title portion and one or more training images have a masked image portion, the generating of the training data comprising: masking a portion of a first training image: masking a portion of a first training image; generating a predicted portion for the first training image based at least in part on masking the portion of the first training image and a title of the first training image; and generating a predicted listing category for the first training image based at least in part on the predicted portion for the first training image; train a machine learning model based on the training data; receive, via a first user interface associated with an online marketplace, a multi-modality request to retrieve a listing for an item, the multi-modality request comprising at least a first image and a first natural language text associated with the item; generate a first image token based at least in part on the first image and a first title token based at least in part on the first natural language text; generate, by the machine learning model, a first vector based on the first image token and the first title token; and cause presentation, via the first user interface associated with the online marketplace, of one or more listings for the multi-modality request based at least in part on a similarity metric between the first vector and each vector of a plurality of vectors associated with a plurality of listings. 9 . The apparatus of claim 8 , wherein generating the training data further comprises: mask a portion of the title of the first training image; generate a predicted portion for a title of the listing based at least in part on the portion of the title of the first training image and the first training image; and generate a predicted listing category based at least in part on the predicted portion for the title. 10 . The apparatus of claim 8 , wherein the instructions are further executable by the processor to cause the apparatus to: determine, by the machine learning model, a similarity metric between the first vector and a plurality of vectors associated with a plurality of categories; and associate, by the machine learning model, the first vector with a first category based at least in part on a similarity metric between the first vector and one or more vectors classified by the machine learning model as being associated with the first category. 11 . The apparatus of claim 10 , wherein the instructions are further executable by the processor to cause the apparatus to: generate, by the machine learning model, a second vector associated with a second image and a second natural language text included in a received multi-modality query; associate, by the machine learning model, the second vector with the first category; and compare the second vector with a plurality of vectors that include the first vector, wherein the first vector is associated with the first category based at least in part on the first vector and the second vector satisfying a similarity metric.
graphically representing goods, e.g. 3D product representation · CPC title
using classification, e.g. of video objects · CPC title
Proximity, similarity or dissimilarity measures · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
Information retrieval or Information management · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.