Machine learning selection of images
US-2024256597-A1 · Aug 1, 2024 · US
US12254597B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12254597-B2 |
| Application number | US-202217709221-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 30, 2022 |
| Priority date | Mar 30, 2022 |
| Publication date | Mar 18, 2025 |
| Grant date | Mar 18, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An item recommendation system receives a set of recommendable items and a request to select, from the set of recommendable items, a contrast group. The item recommendation system selects a contrast group from the set of recommendable items by applying a image modification model to the set of recommendable items. The image modification model includes an item selection model configured to determine an unbiased conversion rate for each item of the set of recommendable items and select a recommended item from the set of recommendable items having a greatest unbiased conversion rate. The image modification model includes a contrast group selection model configured to select, for the recommended item, a contrast group comprising the recommended item and one or more contrast items. The item recommendation system transmits the contrast group responsive to the request.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: receiving, by a request module, an input text and a request for a blended image; generating, by a contrastive language-image pre-training (“CLIP”) module for the input text, an input text CLIP code; selecting, by an initial latent code selection module, an initial latent code from among a set of latent codes, the selection based on a the initial latent code having a corresponding CLIP code with a greatest semantic similarity to the input text CLIP code; generating, by a latent code blending module, a blended image latent code by blending the initial latent code with an input image latent code determined for an input image; and generating, by a latent code generator module, the blended image from the blended image latent code; and transmitting, by the request module responsive to the request, the blended image. 2. The method of claim 1 , wherein the input text specifies target features for modifying the input image. 3. The method of claim 1 , wherein each latent code of the set of latent codes has a corresponding CLIP code. 4. The method of claim 3 , wherein each latent code of the set of latent codes and its corresponding CLIP code is generated from an image of a set of images. 5. The method of claim 1 , wherein the latent code generator module is further configured to generate the input image latent code based on the input image. 6. The method of claim 1 , wherein the latent code blending module comprises a StyleGAN synthesis network and wherein the latent code generator module comprises a StyleGAN encoder. 7. The method of claim 1 , wherein the initial latent code comprises a first set of layers, wherein the input image latent code comprises a second set of layers, wherein each layer of the first set of layers corresponds to a respective layer of the second set of layers, and wherein blending the initial latent code with the input image latent code comprises, blending each layer of the first set of layers with the corresponding respective layer of the second set of layers. 8. A system comprising: a request module configured to receive an input text, an input image, and a request for a blended image; a contrastive language-image pre-training (“CLIP”) module configured to generate, for the input text, an input text CLIP code; an initial latent code selection module configured to select an initial latent code from among a set of latent codes, the selection based on the initial latent code having a corresponding CLIP code with a greatest semantic similarity to the input text CLIP code; a latent code blending module configured to generate a blended image latent code by blending the initial latent code with an input image latent code determined for the input image; and a latent code generator module configured to generate the blended image from the blended image latent code, wherein the request module is further configured to transmit the blended image responsive to the request. 9. The system of claim 8 , wherein the input text specifies target features for modifying the input image. 10. The system of claim 8 , wherein each latent code of the set of latent codes has a corresponding CLIP code. 11. The system of claim 10 , wherein each latent code of the set of latent codes and its corresponding CLIP code is generated from an image of a set of images. 12. The system of claim 8 , wherein the latent code generator module is further configured to generate the input image latent code based on the input image. 13. The system of claim 8 , wherein the latent code blending module comprises a StyleGAN synthesis network and wherein the latent code generator module comprises a StyleGAN encoder. 14. The system of claim 8 , wherein the initial latent code comprises a first set of scales, wherein the input image latent code comprises a second set of scales, wherein each scale of the first set of scales corresponds to a respective scale of the second set of scales, and wherein blending the initial latent code with the input image latent code comprises, blending each scale of the first set of scales with the corresponding respective scale of the second set of scales. 15. A non-transitory computer-readable medium storing executable instructions, which when executed by a processing device, cause the processing device to perform operations comprising: receiving an input image, an input text, and a request for a blended image, wherein the input text specifies target features for modifying the input image; generating the blended image by applying an image modification model to an input image and the input text, wherein the image modification model comprises: a contrastive language-image pre-training (“CLIP”) model configured to generate, for the input text, an input text CLIP code; an initial latent code selection model configured to select an initial latent code from among a set of latent codes, the selection based on the initial latent code having a corresponding CLIP code with a greatest semantic similarity to the input text CLIP code; a latent code blending model configured to generate a blended image latent code by blending the initial latent code with an input image latent code determined for the input image; and a latent code generator model configured to generate the blended image from the blended image latent code; and transmitting, responsive to the request, the blended image. 16. The non-transitory computer-readable medium of claim 15 , wherein each latent code of the set of latent codes has a corresponding CLIP code. 17. The non-transitory computer-readable medium of claim 16 , wherein each latent code of the set of latent codes has a corresponding CLIP code. 18. The non-transitory computer-readable medium of claim 15 , wherein the latent code generator model is further configured to generate the input image latent code based on the input image. 19. The non-transitory computer-readable medium of claim 15 , wherein the latent code blending model comprises a StyleGAN synthesis network and wherein the latent code generator model comprises a StyleGAN encoder. 20. The non-transitory computer-readable medium of claim 15 , wherein the initial latent code comprises a first set of layers, wherein the input image latent code comprises a second set of layers, wherein each layer of the first set of layers corresponds to a respective layer of the second set of layers, and wherein blending the initial latent code with the input image latent code comprises, blending, each layer of the first set of layers with the corresponding respective layer of the second set of layers.
Combinations of networks · CPC title
Training; Learning · CPC title
Artificial neural networks [ANN] · CPC title
Image fusion; Image merging · CPC title
involving graphical user interfaces [GUIs] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.