What technology area does this patent fall under?

Primary CPC classification G06V10/774. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 31 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Global embedding learning from different modalities

US12592059B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12592059-B2
Application number	US-202217983327-A
Country	US
Kind code	B2
Filing date	Nov 8, 2022
Priority date	Nov 8, 2022
Publication date	Mar 31, 2026
Grant date	Mar 31, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system may receive, via a first user interface associated with an online marketplace, a multi-modality request to retrieve a listing for an item, the multi-modality request comprising at least a first image and a first natural language text associated with the item. The system may generate an item embedding based on inputting the first image and the first natural language text to a machine learning model and may generate a first vector associated with the first image and the first natural language text included in the multi-modality request. The system may cause presentation, via the first user interface associated with the online marketplace, of one or more listings for the item retrieved based at least in part on a similarity metric between the first vector and a second vector of a plurality of vectors associated with a plurality of listings.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method, comprising: generating training data comprising training titles and training images, one or more training titles having a masked title portion and one or more training images have a masked image portion, the generating of the training data comprising: masking a portion of a first training image; generating a predicted portion for the first training image based at least in part on masking the portion of the first training image and a title of the first training image; and generating a predicted listing category for the first training image based at least in part on the predicted portion for the first training image; training a machine learning model based on the training data; receiving, via a first user interface associated with an online marketplace, a multi-modality request to retrieve a listing for an item, the multi-modality request comprising at least a first image and a first natural language text associated with the item; generating, by the machine learning model, a first image token based at least in part on the first image and a first title token based at least in part on the first natural language text; generating, by the machine learning model, a first vector based on the first image token and the first title token; and causing presentation, via the first user interface associated with the online marketplace, of one or more listings for the multi-modality request based at least in part on a similarity metric between the first vector and each vector of a plurality of vectors associated with a plurality of listings. 2 . The computer-implemented method of claim 1 , wherein generating the training data further comprises: masking a portion of the title of the first training image; generating a predicted portion for a title of the listing based at least in part on the portion of the title of the first training image and the first training image; and generating a predicted listing category based at least in part on the predicted portion for the title. 3 . The computer-implemented method of claim 1 , further comprising: determining, by the machine learning model, a similarity metric between the first vector and a plurality of vectors associated with a plurality of categories; and associating, by the machine learning model, the first vector with a first category based at least in part on a similarity metric between the first vector and one or more vectors classified by the machine learning model as being associated with the first category. 4 . The computer-implemented method of claim 3 , further comprising: generating, by the machine learning model, a second vector associated with a second image and a second natural language text included in a received multi-modality query; associating, by the machine learning model, the second vector with the first category; and comparing the second vector with a plurality of vectors that include the first vector, wherein the first vector is associated with the first category based at least in part on the first vector and the second vector satisfying a similarity metric. 5 . The computer-implemented method of claim 1 , wherein causing presentation of the one or more listings for the item comprises: causing presentation, via a second user interface associated with the online marketplace, of one or more listings for the item from a first category, wherein the one or more listings that comprise a second image that is different than the first image, a second natural language text that differs from the first natural language text, or both. 6 . The computer-implemented method of claim 1 , further comprising: comparing the portion of the training image with the portion of the training title based at least in part on reconstructing the portion of the training image and reconstructing the portion of the training title; and classifying the listing for the item using the training image and the training title based at least in part on the portion of the training image being associated with the portion of the training title. 7 . The computer-implemented method of claim 1 , further comprising: associating the first vector with a product category based at least in part on the similarity metric between the first vector and vectors classified as being associated with the product category. 8 . An apparatus, comprising: a processor; memory coupled with the processor; and instructions stored in the memory and executable by the processor to cause the apparatus to: generating training data comprising training titles and training images, one or more training titles having a masked title portion and one or more training images have a masked image portion, the generating of the training data comprising: masking a portion of a first training image: masking a portion of a first training image; generating a predicted portion for the first training image based at least in part on masking the portion of the first training image and a title of the first training image; and generating a predicted listing category for the first training image based at least in part on the predicted portion for the first training image; train a machine learning model based on the training data; receive, via a first user interface associated with an online marketplace, a multi-modality request to retrieve a listing for an item, the multi-modality request comprising at least a first image and a first natural language text associated with the item; generate a first image token based at least in part on the first image and a first title token based at least in part on the first natural language text; generate, by the machine learning model, a first vector based on the first image token and the first title token; and cause presentation, via the first user interface associated with the online marketplace, of one or more listings for the multi-modality request based at least in part on a similarity metric between the first vector and each vector of a plurality of vectors associated with a plurality of listings. 9 . The apparatus of claim 8 , wherein generating the training data further comprises: mask a portion of the title of the first training image; generate a predicted portion for a title of the listing based at least in part on the portion of the title of the first training image and the first training image; and generate a predicted listing category based at least in part on the predicted portion for the title. 10 . The apparatus of claim 8 , wherein the instructions are further executable by the processor to cause the apparatus to: determine, by the machine learning model, a similarity metric between the first vector and a plurality of vectors associated with a plurality of categories; and associate, by the machine learning model, the first vector with a first category based at least in part on a similarity metric between the first vector and one or more vectors classified by the machine learning model as being associated with the first category. 11 . The apparatus of claim 10 , wherein the instructions are further executable by the processor to cause the apparatus to: generate, by the machine learning model, a second vector associated with a second image and a second natural language text included in a received multi-modality query; associate, by the machine learning model, the second vector with the first category; and compare the second vector with a plurality of vectors that include the first vector, wherein the first vector is associated with the first category based at least in part on the first vector and the second vector satisfying a similarity metric.

Assignees

Ebay Inc

Inventors

Classifications

G06Q30/0643
graphically representing goods, e.g. 3D product representation · CPC title
G06V10/764
using classification, e.g. of video objects · CPC title
G06V10/761
Proximity, similarity or dissimilarity measures · CPC title
G06F40/284
Lexical analysis, e.g. tokenisation or collocates · CPC title
Y10S706/934
Information retrieval or Information management · CPC title

Patent family

Related publications grouped by family.

View patent family 88558640

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12592059B2 cover?: A system may receive, via a first user interface associated with an online marketplace, a multi-modality request to retrieve a listing for an item, the multi-modality request comprising at least a first image and a first natural language text associated with the item. The system may generate an item embedding based on inputting the first image and the first natural language text to a machine le…
Who is the assignee on this patent?: Ebay Inc
What technology area does this patent fall under?: Primary CPC classification G06V10/774. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 31 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).