Global embedding learning from different modalities

US12592059B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12592059-B2
Application numberUS-202217983327-A
CountryUS
Kind codeB2
Filing dateNov 8, 2022
Priority dateNov 8, 2022
Publication dateMar 31, 2026
Grant dateMar 31, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system may receive, via a first user interface associated with an online marketplace, a multi-modality request to retrieve a listing for an item, the multi-modality request comprising at least a first image and a first natural language text associated with the item. The system may generate an item embedding based on inputting the first image and the first natural language text to a machine learning model and may generate a first vector associated with the first image and the first natural language text included in the multi-modality request. The system may cause presentation, via the first user interface associated with the online marketplace, of one or more listings for the item retrieved based at least in part on a similarity metric between the first vector and a second vector of a plurality of vectors associated with a plurality of listings.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method, comprising: generating training data comprising training titles and training images, one or more training titles having a masked title portion and one or more training images have a masked image portion, the generating of the training data comprising: masking a portion of a first training image; generating a predicted portion for the first training image based at least in part on masking the portion of the first training image and a title of the first training image; and generating a predicted listing category for the first training image based at least in part on the predicted portion for the first training image; training a machine learning model based on the training data; receiving, via a first user interface associated with an online marketplace, a multi-modality request to retrieve a listing for an item, the multi-modality request comprising at least a first image and a first natural language text associated with the item; generating, by the machine learning model, a first image token based at least in part on the first image and a first title token based at least in part on the first natural language text; generating, by the machine learning model, a first vector based on the first image token and the first title token; and causing presentation, via the first user interface associated with the online marketplace, of one or more listings for the multi-modality request based at least in part on a similarity metric between the first vector and each vector of a plurality of vectors associated with a plurality of listings. 2 . The computer-implemented method of claim 1 , wherein generating the training data further comprises: masking a portion of the title of the first training image; generating a predicted portion for a title of the listing based at least in part on the portion of the title of the first training image and the first training image; and generating a predicted listing category based at least in part on the predicted portion for the title. 3 . The computer-implemented method of claim 1 , further comprising: determining, by the machine learning model, a similarity metric between the first vector and a plurality of vectors associated with a plurality of categories; and associating, by the machine learning model, the first vector with a first category based at least in part on a similarity metric between the first vector and one or more vectors classified by the machine learning model as being associated with the first category. 4 . The computer-implemented method of claim 3 , further comprising: generating, by the machine learning model, a second vector associated with a second image and a second natural language text included in a received multi-modality query; associating, by the machine learning model, the second vector with the first category; and comparing the second vector with a plurality of vectors that include the first vector, wherein the first vector is associated with the first category based at least in part on the first vector and the second vector satisfying a similarity metric. 5 . The computer-implemented method of claim 1 , wherein causing presentation of the one or more listings for the item comprises: causing presentation, via a second user interface associated with the online marketplace, of one or more listings for the item from a first category, wherein the one or more listings that comprise a second image that is different than the first image, a second natural language text that differs from the first natural language text, or both. 6 . The computer-implemented method of claim 1 , further comprising: comparing the portion of the training image with the portion of the training title based at least in part on reconstructing the portion of the training image and reconstructing the portion of the training title; and classifying the listing for the item using the training image and the training title based at least in part on the portion of the training image being associated with the portion of the training title. 7 . The computer-implemented method of claim 1 , further comprising: associating the first vector with a product category based at least in part on the similarity metric between the first vector and vectors classified as being associated with the product category. 8 . An apparatus, comprising: a processor; memory coupled with the processor; and instructions stored in the memory and executable by the processor to cause the apparatus to: generating training data comprising training titles and training images, one or more training titles having a masked title portion and one or more training images have a masked image portion, the generating of the training data comprising: masking a portion of a first training image: masking a portion of a first training image; generating a predicted portion for the first training image based at least in part on masking the portion of the first training image and a title of the first training image; and generating a predicted listing category for the first training image based at least in part on the predicted portion for the first training image; train a machine learning model based on the training data; receive, via a first user interface associated with an online marketplace, a multi-modality request to retrieve a listing for an item, the multi-modality request comprising at least a first image and a first natural language text associated with the item; generate a first image token based at least in part on the first image and a first title token based at least in part on the first natural language text; generate, by the machine learning model, a first vector based on the first image token and the first title token; and cause presentation, via the first user interface associated with the online marketplace, of one or more listings for the multi-modality request based at least in part on a similarity metric between the first vector and each vector of a plurality of vectors associated with a plurality of listings. 9 . The apparatus of claim 8 , wherein generating the training data further comprises: mask a portion of the title of the first training image; generate a predicted portion for a title of the listing based at least in part on the portion of the title of the first training image and the first training image; and generate a predicted listing category based at least in part on the predicted portion for the title. 10 . The apparatus of claim 8 , wherein the instructions are further executable by the processor to cause the apparatus to: determine, by the machine learning model, a similarity metric between the first vector and a plurality of vectors associated with a plurality of categories; and associate, by the machine learning model, the first vector with a first category based at least in part on a similarity metric between the first vector and one or more vectors classified by the machine learning model as being associated with the first category. 11 . The apparatus of claim 10 , wherein the instructions are further executable by the processor to cause the apparatus to: generate, by the machine learning model, a second vector associated with a second image and a second natural language text included in a received multi-modality query; associate, by the machine learning model, the second vector with the first category; and compare the second vector with a plurality of vectors that include the first vector, wherein the first vector is associated with the first category based at least in part on the first vector and the second vector satisfying a similarity metric.

Assignees

Inventors

Classifications

  • graphically representing goods, e.g. 3D product representation · CPC title

  • using classification, e.g. of video objects · CPC title

  • Proximity, similarity or dissimilarity measures · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Information retrieval or Information management · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12592059B2 cover?
A system may receive, via a first user interface associated with an online marketplace, a multi-modality request to retrieve a listing for an item, the multi-modality request comprising at least a first image and a first natural language text associated with the item. The system may generate an item embedding based on inputting the first image and the first natural language text to a machine le…
Who is the assignee on this patent?
Ebay Inc
What technology area does this patent fall under?
Primary CPC classification G06V10/774. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 31 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).