Implementing portrait editing using a machine learning model

US2026011056A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2026011056-A1
Application numberUS-202418763728-A
CountryUS
Kind codeA1
Filing dateJul 3, 2024
Priority dateJul 3, 2024
Publication dateJan 8, 2026
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure describes techniques for implementing portrait editing using a machine learning model. An and a text prompt are input into a first machine learning model. The image comprises a portrait of a subject. The text prompt indicates a target result of editing the image. The first machine learning model is trained to perform portrait editing while preserving untargeted features. An editing mask is generated by the first machine-learning model based on the image. The editing mask indicates a first area for editing and a second area for preserving original content of the image. A mask-guided predicted noise is computed at each timestep and a process of editing the image is guided by the first machine learning model based on the editing mask. An edited image is generated by the first machine learning model. The edited image comprises the target editing result and retains detailed features of the subject.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of implementing portrait editing using a machine learning model, comprising: inputting an image and a text prompt into a first machine learning model, wherein the image comprises a portrait of a subject, wherein the text prompt indicates a target result of editing the image, and wherein the first machine learning model is trained to perform portrait editing while preserving untargeted features; generating an editing mask by the first machine-learning model based on the image, wherein the editing mask indicates a first area for editing and a second area for preserving original content of the image; computing a mask-guided predicted noise at each timestep and guiding a process of editing the image by the first machine learning model based on the editing mask; and generating an edited image by the first machine learning model, wherein the edited image comprises the target editing result and retains detailed features of the subject. 2 . The method of claim 1 , further comprising: generating training pairs by a second machine learning model, wherein the training pairs are utilized to train the first machine learning model, wherein the training pairs align with a specified editing direction, wherein each training pair comprises a source image and a target image, and wherein the source image and the target image in each training pair comprise a same subject and indicate the specified editing direction. 3 . The method of claim 2 , further comprising: generating each training pair through a single denoising process by the second machine learning model to enhance identity consistency in the source image and the target image; and generating a single image by the single denoising process, wherein the single image comprises a horizontal concatenation of the source image and the target image. 4 . The method of claim 3 , further comprising: guiding the single denoising process using a pose image to ensure spatial alignment by featuring a same pose in a left and right parts of the single image. 5 . The method of claim 3 , further comprising: generating identity embeddings based on a real-world portrait image; and guiding the single denoising process using the identity embeddings. 6 . The method of claim 5 , further comprising: providing the identity embeddings to the single denoising process by combining the identity embeddings with text embeddings computed from prompts depicting the single image. 7 . The method of claim 2 , further comprising: generating the training pairs to cover a diverse range of appearances by utilizing diverse real-world portrait images. 8 . The method of claim 2 , further comprising: training the first machine learning model using the training pairs, wherein the first machine learning model learns pertinent information from the training pairs, and wherein the pertinent information indicates the specified editing direction and preservation of untargeted subject features. 9 . The method of claim 8 , further comprising: generating spatial embeddings based on the source image in each training pair; concatenating the spatial embeddings with a noisy latent to generate a first concatenation; and inputting the first concatenation into the first machine learning model. 10 . The method of claim 9 , further comprising: generating target text embeddings based on a target prompt depicting the target image in each training pair; generating image embeddings based on the source image in each training pair and projecting the image embeddings to a space of text embeddings, wherein the image embeddings indicate visual information derived from the source image; concatenating the target text embeddings and the image embeddings to generate a second concatenation; and inputting the second concatenation into a cross-attention layer of the first machine learning model. 11 . The method of claim 10 , further comprising: enabling the first machine learning model to possess reconstruction capabilities of reconstructing input images by replacing the target text embeddings with source text embeddings and replacing the target image with the source image in a predetermined percentage of time during training, wherein the source text embeddings are generated based on a source prompt depicting the source image in each training pair, and wherein the reconstruction capabilities of the first machine learning model is utilized during an inference phase for mask generation. 12 . A system of implementing portrait editing using a machine learning model, comprising: at least one processor; and at least one memory communicatively coupled to the at least one processor and comprising computer-readable instructions that upon execution by the at least one processor cause the at least one processor to perform operations comprising: inputting an image and a text prompt into a first machine learning model, wherein the image comprises a portrait of a subject, wherein the text prompt indicates a target result of editing the image, and wherein the first machine learning model is trained to perform portrait editing while preserving untargeted features; generating an editing mask by the first machine-learning model based on the image, wherein the editing mask indicates a first area for editing and a second area for preserving original content of the image; computing a mask-guided predicted noise at each timestep and guiding a process of editing the image by the first machine learning model based on the editing mask; and generating an edited image by the first machine learning model, wherein the edited image comprises the target editing result and retains detailed features of the subject. 13 . The system of claim 12 , the operations further comprising: generating training pairs by a second machine learning model, wherein the training pairs are utilized to train the first machine learning model, wherein the training pairs align with a specified editing direction, wherein each training pair comprises a source image and a target image, and wherein the source image and the target image in each training pair comprise a same subject and indicate the specified editing direction. 14 . The system of claim 13 , the operations further comprising: generating each training pair through a single denoising process by the second machine learning model to enhance identity consistency in the source image and the target image; and generating a single image by the single denoising process, wherein the single image comprises a horizontal concatenation of the source image and the target image. 15 . The system of claim 13 , the operations further comprising: training the first machine learning model using the training pairs, wherein the first machine learning model learns pertinent information from the training pairs, and wherein the pertinent information indicates the specified editing direction and preservation of untargeted subject features. 16 . The system of claim 15 , the operations further comprising: generating spatial embeddings based on the source image in each training pair; concatenating the spatial embeddings with a noisy latent to generate a first concatenation; generating target text embeddings based on a target prompt depicting the target image in each training pair; generating image embeddings based on the source image in each training pair and projecting the image embeddings to a space of text embeddings; concatenating the target text embeddings and the image embeddings to generate a second concatenation; and inputting the first

Assignees

Inventors

Classifications

  • using machine learning, e.g. neural networks · CPC title

  • G06T11/60Primary

    Creating or editing images; Combining images with text · CPC title

  • Machine learning · CPC title

  • Filling planar surfaces by adding surface attributes, e.g. adding colours or textures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2026011056A1 cover?
The present disclosure describes techniques for implementing portrait editing using a machine learning model. An and a text prompt are input into a first machine learning model. The image comprises a portrait of a subject. The text prompt indicates a target result of editing the image. The first machine learning model is trained to perform portrait editing while preserving untargeted features. …
Who is the assignee on this patent?
Lemon Inc
What technology area does this patent fall under?
Primary CPC classification G06T11/60. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 08 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).