Digital design of an area
US-2021209261-A1 · Jul 8, 2021 · US
US11914635B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11914635-B2 |
| Application number | US-202117455786-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 19, 2021 |
| Priority date | Nov 19, 2021 |
| Publication date | Feb 27, 2024 |
| Grant date | Feb 27, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for image searching are described. The systems and methods include receiving a search query comprising user input for a reference image; converting the user input for the reference image to a preference statement using a machine learning model; encoding the preference statement in an embedding space to obtain an encoded preference statement; combining the encoded preference statement with an encoded reference image representing the reference image in the embedding space to obtain a multi-modal search encoding; and performing a search operation using the multi-modal search encoding to retrieve a second image, wherein the second image differs from the reference image based on the user input for the reference image.
Opening claim text (preview).
What is claimed is: 1. A method for image searching, comprising: receiving a user input for a reference image, wherein the user input comprises a critique of the reference image, and wherein the critique comprises a natural language statement expressing a negative opinion relating to the reference image; modifying the user input by generating a preference statement based on the critique using a machine learning language model, wherein the preference statement comprises a natural language statement of a positive preference corresponding to the reference image, and wherein the positive preference is an inversion of the negative opinion; encoding, using a multi-modal encoder, the preference statement in a multi-modal embedding space to obtain an encoded preference statement; combining the encoded preference statement with an encoded reference image representing the reference image in the multi-modal embedding space to obtain a multi-modal search encoding; and performing a search operation using the multi-modal search encoding to retrieve a two-dimensional image, wherein the two-dimensional image differs from the reference image based on the user input for the reference image. 2. The method of claim 1 , further comprising: receiving an additional user input for the two-dimensional image; generating an additional preference statement based on the additional user input using the machine learning language model; and retrieving a third image based on the additional preference statement. 3. The method of claim 1 , further comprising: determining that an additional user input does not comprise an image critique; bypassing a preference statement generation process based on the determination; and retrieving a third image based on the additional user input. 4. The method of claim 1 , further comprising: identifying the reference image in response to a search query; generating a caption text based on the search query and the preference statement; and encoding the caption text in the multi-modal embedding space to obtain an encoded caption, wherein the multi-modal search encoding further comprises the encoded caption. 5. The method of claim 1 , further comprising: determining that the user input for the reference image comprises the critique of the reference image using an intent classifier. 6. The method of claim 1 , further comprising: encoding, using the multi-modal encoder, the reference image in the multi-modal embedding space to obtain the encoded reference image. 7. The method of claim 1 , further comprising: comparing each of a plurality of encoded images to the multi-modal search encoding to obtain a similarity score for each of the plurality of encoded images; and selecting the two-dimensional image from among the plurality of encoded images based on a similarity score corresponding to the two-dimensional image. 8. The method of claim 1 , further comprising: retrieving a plurality of images based on the multi-modal search encoding; receiving a subsequent user input for an image of the plurality of images; and retrieving a plurality of additional images based on the subsequent user input. 9. The method of claim 1 , wherein: a loss function for the machine learning language model is computed by comparing the preference statement to a corresponding preference statement from a plurality of ground truth preference statements; and the machine learning language model is trained by updating parameters of the machine learning model based on the loss function. 10. A non-transitory computer readable medium storing instructions that, when executed by a processor, cause the processor to: receive, for training a machine learning language model, training data comprising an input statement and a ground truth preference statement corresponding to the input statement, wherein the input statement comprises a critique comprising a natural language statement expressing a negative opinion and wherein the ground truth preference statement comprises a natural language statement expressing a positive preference that is an inversion of the negative opinion; modify the input statement by generating a preference statement based on the critique using the machine learning language model, wherein the preference statement comprises a natural language statement of a positive preference corresponding to the negative opinion; and train the machine learning language model using the preference statement and the ground truth preference statement to generate a trained machine learning model, the trained machine learning model being configured to perform a search operation to retrieve a two-dimensional image that matches a query preference statement corresponding to a user input. 11. The non-transitory computer readable medium of claim 10 , wherein the instructions further cause the processor to: compute a loss function for the machine learning language model by comparing the preference statement to the ground truth preference statement, wherein the machine learning language model is trained by updating parameters of the machine learning language model based on the loss function. 12. The non-transitory computer readable medium of claim 10 , wherein the instructions further cause the processor to: receive multi-modal training data for a multi-modal encoder, wherein the multi-modal training data comprises images and image descriptions; and train the multi-modal encoder using the multi-modal training data to encode the query preference statement to obtain an encoded preference statement in a multi-modal embedding space. 13. The non-transitory computer readable medium of claim 10 , wherein the instructions further cause the processor to: train an intent classifier to determine whether a text comprises a critique of an image. 14. The non-transitory computer readable medium of claim 10 , wherein the instructions further cause the processor to: generate a caption text based on the query preference statement and a search query for the image that matches the query preference statement; and encode the caption text in a multi-modal embedding space to obtain an encoded caption, wherein the two-dimensional image that matches the query preference statement is retrieved based at least in part on the encoded caption. 15. The non-transitory computer readable medium of claim 10 , wherein: a search operation is performed to retrieve a two-dimensional image that matches one or more query preference statements; a plurality of two-dimensional images are retrieved based on the search operation; a user selection is received identifying one of the plurality of two-dimensional images; and parameters of the machine learning language model are updated based on the user selection. 16. A system comprising: one or more processors; and one or more memory components coupled with the one or more processors, the one or more processors configured to: receive a user input for a reference image, wherein the user input comprises a critique of the reference image, and wherein the critique comprises a negative natural language statement expressing a negative opinion relating to the reference image; modify the user input by generating a preference statement based on the critique using a machine learning language model, wherein the preference statement comprises a natural language statement of a positive preference corresponding to the reference image, and wherein the positive preference is an inversion of the negative opinion; encode the preference statement in a multi-modal embedding space to obtain an encoded prefere
Query formulation, e.g. graphical querying · CPC title
Querying · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.