What technology area does this patent fall under?

Primary CPC classification G06F16/532. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 27 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Performing image search based on user input using neural networks

US11914635B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11914635-B2
Application number	US-202117455786-A
Country	US
Kind code	B2
Filing date	Nov 19, 2021
Priority date	Nov 19, 2021
Publication date	Feb 27, 2024
Grant date	Feb 27, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for image searching are described. The systems and methods include receiving a search query comprising user input for a reference image; converting the user input for the reference image to a preference statement using a machine learning model; encoding the preference statement in an embedding space to obtain an encoded preference statement; combining the encoded preference statement with an encoded reference image representing the reference image in the embedding space to obtain a multi-modal search encoding; and performing a search operation using the multi-modal search encoding to retrieve a second image, wherein the second image differs from the reference image based on the user input for the reference image.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for image searching, comprising: receiving a user input for a reference image, wherein the user input comprises a critique of the reference image, and wherein the critique comprises a natural language statement expressing a negative opinion relating to the reference image; modifying the user input by generating a preference statement based on the critique using a machine learning language model, wherein the preference statement comprises a natural language statement of a positive preference corresponding to the reference image, and wherein the positive preference is an inversion of the negative opinion; encoding, using a multi-modal encoder, the preference statement in a multi-modal embedding space to obtain an encoded preference statement; combining the encoded preference statement with an encoded reference image representing the reference image in the multi-modal embedding space to obtain a multi-modal search encoding; and performing a search operation using the multi-modal search encoding to retrieve a two-dimensional image, wherein the two-dimensional image differs from the reference image based on the user input for the reference image. 2. The method of claim 1 , further comprising: receiving an additional user input for the two-dimensional image; generating an additional preference statement based on the additional user input using the machine learning language model; and retrieving a third image based on the additional preference statement. 3. The method of claim 1 , further comprising: determining that an additional user input does not comprise an image critique; bypassing a preference statement generation process based on the determination; and retrieving a third image based on the additional user input. 4. The method of claim 1 , further comprising: identifying the reference image in response to a search query; generating a caption text based on the search query and the preference statement; and encoding the caption text in the multi-modal embedding space to obtain an encoded caption, wherein the multi-modal search encoding further comprises the encoded caption. 5. The method of claim 1 , further comprising: determining that the user input for the reference image comprises the critique of the reference image using an intent classifier. 6. The method of claim 1 , further comprising: encoding, using the multi-modal encoder, the reference image in the multi-modal embedding space to obtain the encoded reference image. 7. The method of claim 1 , further comprising: comparing each of a plurality of encoded images to the multi-modal search encoding to obtain a similarity score for each of the plurality of encoded images; and selecting the two-dimensional image from among the plurality of encoded images based on a similarity score corresponding to the two-dimensional image. 8. The method of claim 1 , further comprising: retrieving a plurality of images based on the multi-modal search encoding; receiving a subsequent user input for an image of the plurality of images; and retrieving a plurality of additional images based on the subsequent user input. 9. The method of claim 1 , wherein: a loss function for the machine learning language model is computed by comparing the preference statement to a corresponding preference statement from a plurality of ground truth preference statements; and the machine learning language model is trained by updating parameters of the machine learning model based on the loss function. 10. A non-transitory computer readable medium storing instructions that, when executed by a processor, cause the processor to: receive, for training a machine learning language model, training data comprising an input statement and a ground truth preference statement corresponding to the input statement, wherein the input statement comprises a critique comprising a natural language statement expressing a negative opinion and wherein the ground truth preference statement comprises a natural language statement expressing a positive preference that is an inversion of the negative opinion; modify the input statement by generating a preference statement based on the critique using the machine learning language model, wherein the preference statement comprises a natural language statement of a positive preference corresponding to the negative opinion; and train the machine learning language model using the preference statement and the ground truth preference statement to generate a trained machine learning model, the trained machine learning model being configured to perform a search operation to retrieve a two-dimensional image that matches a query preference statement corresponding to a user input. 11. The non-transitory computer readable medium of claim 10 , wherein the instructions further cause the processor to: compute a loss function for the machine learning language model by comparing the preference statement to the ground truth preference statement, wherein the machine learning language model is trained by updating parameters of the machine learning language model based on the loss function. 12. The non-transitory computer readable medium of claim 10 , wherein the instructions further cause the processor to: receive multi-modal training data for a multi-modal encoder, wherein the multi-modal training data comprises images and image descriptions; and train the multi-modal encoder using the multi-modal training data to encode the query preference statement to obtain an encoded preference statement in a multi-modal embedding space. 13. The non-transitory computer readable medium of claim 10 , wherein the instructions further cause the processor to: train an intent classifier to determine whether a text comprises a critique of an image. 14. The non-transitory computer readable medium of claim 10 , wherein the instructions further cause the processor to: generate a caption text based on the query preference statement and a search query for the image that matches the query preference statement; and encode the caption text in a multi-modal embedding space to obtain an encoded caption, wherein the two-dimensional image that matches the query preference statement is retrieved based at least in part on the encoded caption. 15. The non-transitory computer readable medium of claim 10 , wherein: a search operation is performed to retrieve a two-dimensional image that matches one or more query preference statements; a plurality of two-dimensional images are retrieved based on the search operation; a user selection is received identifying one of the plurality of two-dimensional images; and parameters of the machine learning language model are updated based on the user selection. 16. A system comprising: one or more processors; and one or more memory components coupled with the one or more processors, the one or more processors configured to: receive a user input for a reference image, wherein the user input comprises a critique of the reference image, and wherein the critique comprises a negative natural language statement expressing a negative opinion relating to the reference image; modify the user input by generating a preference statement based on the critique using a machine learning language model, wherein the preference statement comprises a natural language statement of a positive preference corresponding to the reference image, and wherein the positive preference is an inversion of the negative opinion; encode the preference statement in a multi-modal embedding space to obtain an encoded prefere

Assignees

Adobe Inc

Inventors

Classifications

G06F16/532Primary
Query formulation, e.g. graphical querying · CPC title
G06F16/53Primary
Querying · CPC title

Patent family

Related publications grouped by family.

View patent family 86383829

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11914635B2 cover?: Systems and methods for image searching are described. The systems and methods include receiving a search query comprising user input for a reference image; converting the user input for the reference image to a preference statement using a machine learning model; encoding the preference statement in an embedding space to obtain an encoded preference statement; combining the encoded preference …
Who is the assignee on this patent?: Adobe Inc
What technology area does this patent fall under?: Primary CPC classification G06F16/532. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 27 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Digital design of an area

Generating user-customized items using a visually-aware image generation network

Generating labels for images associated with a user

Real-time mobile device capture and generation of art-styled ar/vr content

Frequently asked questions