What technology area does this patent fall under?

Primary CPC classification G06Q30/0613. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 08 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Interactive retrieval using visual semantic matching

US11720942B1 · US · B1

Patent metadata
Field	Value
Publication number	US-11720942-B1
Application number	US-202016915361-A
Country	US
Kind code	B1
Filing date	Jun 29, 2020
Priority date	Nov 12, 2019
Publication date	Aug 8, 2023
Grant date	Aug 8, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are generally described for interactive image retrieval using visual semantic matching. Image data and text data are encoded into a single shared visual semantic embedding space. A prediction model is trained using reference inputs, target outputs, and modification text describing changes to the reference inputs to obtain the target outputs. The prediction model can be used to perform image-to-text, text-to-image, and interactive retrieval.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of interactive shopping assistance, said method comprising: training a machine learning product prediction model based at least in part on: determining, using at least one processor, a predicted output vector based on an encoding of a reference image input and an encoding of a modification text input describing a modification to the reference image input that results in a target image output; determining, using the at least one processor, a target output vector based at least in part on an encoding of the target image output; and determining, using the at least one processor, a compositional matching loss based at least in part on a difference between the predicted output vector and the target output vector; receiving image data from a user, the image data representing an image of an article of clothing; receiving a modification input from the user, the modification input describing a desired modification to the article of clothing; and processing the image data and the modification input with the machine learning product prediction model to identify a target product corresponding to the desired modification to the article of clothing; and sending image data of the target product to the user. 2 . The method according to claim 1 , further comprising: training an embedding model based at least in part on: determining a first representation of reference image data; determining a second representation of a reference textual description describing the reference image data; determining a third representation of negative reference image data different than the reference image data; determining a fourth representation of a negative reference description describing the negative reference image data; and determining an embedding loss based at least in part on the first representation, the second representation, the third representation, and the fourth representation; and generating the encoding of the reference image input using the embedding model; and generating the encoding of the target image output using the embedding model. 3 . The method according to claim 1 , further comprising: for each product of a plurality of products in product catalog, determining a corresponding product vector in an embedding space based at least in part on an encoding of a corresponding image of the product; processing the image data and the modification input with the machine learning product prediction model to generate a predicted output vector in the embedding space, the predicted output vector corresponding to the desired modification to the article of clothing; and identifying the target product of the plurality of products by determining that a first product vector corresponding to the target product is closest of all of the product vectors to the predicted output vector in the embedding space. 4 . A method, comprising: training a machine learning prediction model based at least in part on: determining, using at least one processor, a predicted output vector based at least in part on an encoding of a reference input and an encoding of a modification input describing a modification to the reference input; determining, using the at least one processor, a target output vector based at least in part on an encoding of a target output; and determining, using the at least one processor, a compositional matching loss based at least in part on a difference between the predicted output vector and the target output vector; receiving a query modification input describing a modification to a query reference input; and processing the query modification input and the query reference input with the machine learning prediction model to generate a predicted query output vector. 5 . The method according to claim 4 , further comprising: for each of a plurality of objects in a database, determining a corresponding result vector based at least in part on an encoding of the object; and identifying a first object of the plurality of objects by determining that a first result vector corresponding to the first object is closest of all of the result vectors corresponding to the plurality of objects in the database to the predicted query output vector in an embedding space. 6 . The method according to claim 4 , wherein the training the machine learning prediction model comprises: determining the predicted output vector based at least in part on an encoding of a reference image data input and an encoding of a modification text input describing the modification to the reference image data input; and determining the target output vector based at least in part on an encoding of a target image data output. 7 . The method according to claim 6 , further comprising: generating the encoding of the reference image data input by sending the reference image data input to a convolutional neural network (CNN) and applying an image projection model; and generating the encoding of the target image data output by sending the target image data output to a CNN and applying the image projection model. 8 . The method according to claim 6 , further comprising: generating the encoding of the modification text input by sending the modification text input to a long short term memory (LSTM) and applying a text projection model. 9 . The method according to claim 4 , further comprising: determining the compositional matching loss according to: L = L v s e + L i m + L t m wherein L represents the compositional matching loss, L vse represents an embedding loss, L im represents a compositional image matching loss, and L tm represents a compositional text matching loss. 10 . The method according to claim 4 , further comprising: training an embedding model based at least in part on: determining a first representation of reference image data; determining a second representation of a reference description describing the reference image data; determining a third representation of negative reference image data different than the reference image data; determining a fourth representation of a negative reference description describing the negative reference image data; and determining an embedding loss based at least in part on the first representation, the second representation, the third representation, and the fourth representation; and generating the encoding of the reference input using the embedding model; and generating the encoding of the target output using the embedding model. 11 . The method according to claim 10 , further comprising: determining the embedding loss based at least in part on a first difference between the first representation and the fourth representation, and a second difference between the second representation and the third representation. 12 . The method according to claim 10 , further comprising: determining the embedding loss according to: L v s e = d v,t − d v,t ⎺ + m + + d v,t − d v ⎺ , t + m + wherein L vse represents the embedding loss, v represents the first representation of the reference image data, t represents the second representation of the reference description, ν- represents the third representation of the negative reference image data, and t- represents the fourth representation of the negative reference description, d represents a distance between representations in an embedding space defined by the embedding model, and m represents a margin. 13

Assignees

Amazon Tech Inc

Inventors

Classifications

G06N3/045
Combinations of networks · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06Q30/0613Primary
using intermediate agents · CPC title

Patent family

Related publications grouped by family.

View patent family 87522316

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11720942B1 cover?: Techniques are generally described for interactive image retrieval using visual semantic matching. Image data and text data are encoded into a single shared visual semantic embedding space. A prediction model is trained using reference inputs, target outputs, and modification text describing changes to the reference inputs to obtain the target outputs. The prediction model can be used to perfor…
Who is the assignee on this patent?: Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification G06Q30/0613. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 08 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Cloth Warping Using Multi-Scale Patch Adversarial Loss

Mobile device platform for automated visual retail product recognition

Machine learning based identification of visually complementary item collections

Fabric identifying method, apparatus, and system

Intelligent online personal assistant with offline visual search database

Method and apparatus for selectively providing information on objects in a captured image

Frequently asked questions