Cloth Warping Using Multi-Scale Patch Adversarial Loss
US-2021133919-A1 · May 6, 2021 · US
US11720942B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-11720942-B1 |
| Application number | US-202016915361-A |
| Country | US |
| Kind code | B1 |
| Filing date | Jun 29, 2020 |
| Priority date | Nov 12, 2019 |
| Publication date | Aug 8, 2023 |
| Grant date | Aug 8, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are generally described for interactive image retrieval using visual semantic matching. Image data and text data are encoded into a single shared visual semantic embedding space. A prediction model is trained using reference inputs, target outputs, and modification text describing changes to the reference inputs to obtain the target outputs. The prediction model can be used to perform image-to-text, text-to-image, and interactive retrieval.
Opening claim text (preview).
What is claimed is: 1 . A method of interactive shopping assistance, said method comprising: training a machine learning product prediction model based at least in part on: determining, using at least one processor, a predicted output vector based on an encoding of a reference image input and an encoding of a modification text input describing a modification to the reference image input that results in a target image output; determining, using the at least one processor, a target output vector based at least in part on an encoding of the target image output; and determining, using the at least one processor, a compositional matching loss based at least in part on a difference between the predicted output vector and the target output vector; receiving image data from a user, the image data representing an image of an article of clothing; receiving a modification input from the user, the modification input describing a desired modification to the article of clothing; and processing the image data and the modification input with the machine learning product prediction model to identify a target product corresponding to the desired modification to the article of clothing; and sending image data of the target product to the user. 2 . The method according to claim 1 , further comprising: training an embedding model based at least in part on: determining a first representation of reference image data; determining a second representation of a reference textual description describing the reference image data; determining a third representation of negative reference image data different than the reference image data; determining a fourth representation of a negative reference description describing the negative reference image data; and determining an embedding loss based at least in part on the first representation, the second representation, the third representation, and the fourth representation; and generating the encoding of the reference image input using the embedding model; and generating the encoding of the target image output using the embedding model. 3 . The method according to claim 1 , further comprising: for each product of a plurality of products in product catalog, determining a corresponding product vector in an embedding space based at least in part on an encoding of a corresponding image of the product; processing the image data and the modification input with the machine learning product prediction model to generate a predicted output vector in the embedding space, the predicted output vector corresponding to the desired modification to the article of clothing; and identifying the target product of the plurality of products by determining that a first product vector corresponding to the target product is closest of all of the product vectors to the predicted output vector in the embedding space. 4 . A method, comprising: training a machine learning prediction model based at least in part on: determining, using at least one processor, a predicted output vector based at least in part on an encoding of a reference input and an encoding of a modification input describing a modification to the reference input; determining, using the at least one processor, a target output vector based at least in part on an encoding of a target output; and determining, using the at least one processor, a compositional matching loss based at least in part on a difference between the predicted output vector and the target output vector; receiving a query modification input describing a modification to a query reference input; and processing the query modification input and the query reference input with the machine learning prediction model to generate a predicted query output vector. 5 . The method according to claim 4 , further comprising: for each of a plurality of objects in a database, determining a corresponding result vector based at least in part on an encoding of the object; and identifying a first object of the plurality of objects by determining that a first result vector corresponding to the first object is closest of all of the result vectors corresponding to the plurality of objects in the database to the predicted query output vector in an embedding space. 6 . The method according to claim 4 , wherein the training the machine learning prediction model comprises: determining the predicted output vector based at least in part on an encoding of a reference image data input and an encoding of a modification text input describing the modification to the reference image data input; and determining the target output vector based at least in part on an encoding of a target image data output. 7 . The method according to claim 6 , further comprising: generating the encoding of the reference image data input by sending the reference image data input to a convolutional neural network (CNN) and applying an image projection model; and generating the encoding of the target image data output by sending the target image data output to a CNN and applying the image projection model. 8 . The method according to claim 6 , further comprising: generating the encoding of the modification text input by sending the modification text input to a long short term memory (LSTM) and applying a text projection model. 9 . The method according to claim 4 , further comprising: determining the compositional matching loss according to: L = L v s e + L i m + L t m wherein L represents the compositional matching loss, L vse represents an embedding loss, L im represents a compositional image matching loss, and L tm represents a compositional text matching loss. 10 . The method according to claim 4 , further comprising: training an embedding model based at least in part on: determining a first representation of reference image data; determining a second representation of a reference description describing the reference image data; determining a third representation of negative reference image data different than the reference image data; determining a fourth representation of a negative reference description describing the negative reference image data; and determining an embedding loss based at least in part on the first representation, the second representation, the third representation, and the fourth representation; and generating the encoding of the reference input using the embedding model; and generating the encoding of the target output using the embedding model. 11 . The method according to claim 10 , further comprising: determining the embedding loss based at least in part on a first difference between the first representation and the fourth representation, and a second difference between the second representation and the third representation. 12 . The method according to claim 10 , further comprising: determining the embedding loss according to: L v s e = d v,t − d v,t ⎺ + m + + d v,t − d v ⎺ , t + m + wherein L vse represents the embedding loss, v represents the first representation of the reference image data, t represents the second representation of the reference description, ν- represents the third representation of the negative reference image data, and t- represents the fourth representation of the negative reference description, d represents a distance between representations in an embedding space defined by the embedding model, and m represents a margin. 13
Combinations of networks · CPC title
Supervised learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
using intermediate agents · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.