Personalized text-to-image generation

US2024355022A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2024355022-A1
Application numberUS-202318476504-A
CountryUS
Kind codeA1
Filing dateSep 28, 2023
Priority dateApr 20, 2023
Publication dateOct 24, 2024
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One or more aspects of a method, apparatus, and non-transitory computer readable medium include obtaining an input description and an input image depicting a subject, encoding the input description using a text encoder of an image generation model to obtain a text embedding, and encoding the input image using a subject encoder of the image generation model to obtain a subject embedding. A guidance embedding is generated by combining the subject embedding and the text embedding, and then an output image is generated based on the guidance embedding using a diffusion model of the image generation model. The output image depicts aspects of the subject and the input description.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of generating an image, comprising: obtaining an input description and an input image depicting a subject; encoding the input description using a text encoder of an image generation model to obtain a text embedding; encoding the input image using a subject encoder of the image generation model to obtain a subject embedding; generating a guidance embedding by combining the subject embedding and the text embedding; and generating an output image based on the guidance embedding using a diffusion model of the image generation model, wherein the output image depicts one or more aspects of the input image and the input description. 2 . The method of claim 1 , wherein: the guidance embedding is generated by replacing an identifier in the text embedding with the subject embedding. 3 . The method of claim 1 , further comprising: encoding the input image to obtain a feature embedding representing a subject identity of the input image, wherein the output image is generated based on the feature embedding. 4 . The method of claim 3 , further comprising: introducing the feature embedding into an adapter layer of the image generation model to preserve subject identity in the output image. 5 . The method of claim 1 , wherein: the text embedding is generated using a first multi-modal encoder and the subject embedding is generated using a second multi-modal encoder. 6 . The method of claim 1 , further comprising: applying a balance factor and a renormalization factor to the subject embedding. 7 . The method of claim 6 , wherein: the balance factor is less than 1. 8 . A method of training an image generation model, comprising: obtaining a training data set including a training image; and training an image generation model including a subject encoder and a diffusion model based on the training set, wherein the subject encoder is trained to encode an input image depicting a subject to obtain a subject embedding and the diffusion model is trained to generate an output image depicting the subject based on the subject embedding. 9 . The method of claim 8 , further comprising: generating a feature embedding; and applying a balance factor and a renormalization factor to the feature embedding. 10 . The method of claim 9 , further comprising: setting the balance factor to one. 11 . The method of claim 8 , further comprising: masking out a background of the training image. 12 . The method of claim 11 , further comprising: performing augmentations to the training image to obtain additional training data. 13 . The method of claim 12 , wherein: the diffusion model of the image generation model includes a U-net with adapter layers, wherein parameters of the U-net are fixed during the training. 14 . The method of claim 13 , further comprising: obtaining a latent noisy image from a ground-truth image, wherein the training is based on the latent noisy image and the ground-truth image. 15 . An apparatus comprising: one or more processors; one or more memories including instructions executable by the one or more processors; an image generation model comprising parameters stored in the one or more memories, wherein the image generation model is configured to receive a plurality of images as input, and is trained to generate a new image based on a feature embedding and a text embedding generated from the plurality of images and an input description. 16 . The apparatus of claim 15 , wherein: the image generation model comprises a diffusion model including a U-net with one or more adapter layers. 17 . The apparatus of claim 16 , wherein: parameters of cross-attention layers of the U-net are fixed during training, and the adapter layers are trainable during the training. 18 . The apparatus of claim 17 , wherein: the text embedding is generated using a multi-modal encoder. 19 . The apparatus of claim 18 , wherein: the image generation model is further configured to apply a balance factor and a renormalization factor to the feature embedding. 20 . The apparatus of claim 19 , wherein: the diffusion model is pre-trained and the adapter layers are trained using a plurality of training images and a text description.

Assignees

Inventors

Classifications

  • Two-dimensional [2D] image generation · CPC title

  • G06T9/00Primary

    Image coding (bandwidth or redundancy reduction for static pictures H04N1/41; coding or decoding of static colour picture signals H04N1/64; methods or arrangements for coding, decoding, compressing or decompressing digital video signals H04N19/00) · CPC title

  • involving foreground-background segmentation · CPC title

  • Training; Learning · CPC title

  • G06T11/60Primary

    Creating or editing images; Combining images with text · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024355022A1 cover?
One or more aspects of a method, apparatus, and non-transitory computer readable medium include obtaining an input description and an input image depicting a subject, encoding the input description using a text encoder of an image generation model to obtain a text embedding, and encoding the input image using a subject encoder of the image generation model to obtain a subject embedding. A guida…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06T9/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Oct 24 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).