Systems and methods for feedback based instructional visual editing

US12494004B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12494004-B2
Application numberUS-202318350876-A
CountryUS
Kind codeB2
Filing dateJul 12, 2023
Priority dateMar 8, 2023
Publication dateDec 9, 2025
Grant dateDec 9, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments described herein provide a feedback based instructional image editing framework that employs a diffusion process to follow user instruction for image editing. A diffusion model is fine-tuned using a reward model, which may be trained via human annotation. The training of the reward model may be done by having the image editing model output a number of images, which a human annotator ranks based on their alignment with the original image and a given instruction.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of training a neural network based instructional image editing model, the method comprising: receiving, via a data interface, a training dataset comprising an input image, an editing instruction, and an edited image; generating a noisy latent image representation of the edited image by gradually adding noise to a latent representation of the edited image; generating, by the neural network based instructional image editing model, an estimated noise from the noisy latent image representation based on the input image and the editing instruction; computing, by a neural network based reward model, a reward score indicative of an alignment level between the edited image and the input image according to the editing instruction; computing a loss objective based on the added noise, the estimated noise, and the reward score; and training the neural network based instructional image editing model based on the computed loss objective via backpropagation. 2 . The method of claim 1 , further comprising: receiving, via the data interface, a second training dataset comprising a second input image, and a second editing instruction; generating, by the neural network based instructional image editing model, a plurality of candidate edited images based on the second input image and the second editing instruction; displaying the plurality of candidate edited images on a display; receiving an indication of a quality associated with the plurality of candidate edited images; and training the neural network based reward model based on the indication. 3 . The method of claim 2 , wherein the indication comprises a ranking of the plurality of candidate edited images. 4 . The method of claim 1 , wherein the neural network based instructional image editing model comprises a series of neural network based denoising models, wherein each neural network based denoising model generates a respective estimated noise from an input image representations, and wherein the estimated noise from the noisy latent image representation is one of the respective estimated noise. 5 . The method of claim 1 , wherein the computing the loss objective comprises weighting the loss objective based on the reward score. 6 . The method of claim 1 , further comprising: modifying the editing instruction based on the reward score. 7 . The method of claim 6 , wherein the modifying comprises appending text to the editing instruction including a value based on the reward score. 8 . The method of claim 1 , further comprising: scaling and rounding the reward score to an integer over a predetermined range of values. 9 . The method of claim 1 , wherein the generating the noisy latent image representation of the edited image comprises: encoding, via the encoder, the edited image into a latent representation of the edited image; and adding a generated noise to the latent representation of the edited image. 10 . A system for training a neural network based instructional image editing model, the system comprising: a memory that stores the neural network based instructional image editing model and a plurality of processor-executable instructions; a communication interface that receives a training dataset comprising an input image, an editing instruction, and an edited image; and one or more hardware processors that read and execute the plurality of processor-executable instructions from the memory to perform operations comprising: generating a noisy latent image representation of the edited image by gradually adding noise to a latent representation of the edited image; generating, by the neural network based instructional image editing model, an estimated noise from the noisy latent image representation based on the input image and the editing instruction; computing, by a neural network based reward model, a reward score indicative of an alignment level between the edited image and the input image according to the editing instruction; computing a loss objective based on the added noise, the estimated noise, and the reward score; and training the neural network based instructional image editing model based on the computed loss objective via backpropagation. 11 . The system of claim 10 , the operations further comprising: receiving, via the communication interface, a second training dataset comprising a second input image, and a second editing instruction; generating, by the neural network based instructional image editing model, a plurality of candidate edited images based on the second input image and the second editing instruction; displaying the plurality of candidate edited images on a display; receiving an indication of a quality associated with the plurality of candidate edited images; and training the neural network based reward model based on the indication. 12 . The system of claim 11 , wherein the indication comprises a ranking of the plurality of candidate edited images. 13 . The system of claim 10 , wherein the neural network based instructional image editing model comprises a series of neural network based denoising models, wherein each neural network based denoising model generates a respective estimated noise from an input image representations, and wherein the estimated noise from the noisy latent image representation is one of the respective estimated noise. 14 . The system of claim 10 , wherein the computing the loss objective comprises weighting the loss objective based on the reward score. 15 . The system of claim 10 , the operations further comprising: modifying the editing instruction based on the reward score. 16 . The system of claim 15 , wherein the modifying comprises appending text to the editing instruction including a value based on the reward score. 17 . The system of claim 10 , the operations further comprising: scaling and rounding the reward score to an integer over a predetermined range of values. 18 . The system of claim 10 , wherein the generating the noisy latent image representation of the edited image comprises: encoding, via the encoder, the edited image into a latent representation of the edited image; and adding a generated noise to the latent representation of the edited image. 19 . A non-transitory machine-readable medium comprising a plurality of machine-executable instructions which, when executed by one or more processors, are adapted to cause the one or more processors to perform operations comprising: receiving, via a data interface, a training dataset comprising an input image, an editing instruction, and an edited image; generating a noisy latent image representation of the edited image by gradually adding noise to a latent representation of the edited image; generating, by a neural network based instructional image editing model, an estimated noise from the noisy latent image representation based on the input image and the editing instruction; computing, by a neural network based reward model, a reward score indicative of an alignment level between the edited image and the input image according to the editing instruction; computing a loss objective based on the added noise, the estimated noise, and the reward score; and training the neural network based instructional image editing model based on the computed loss objective via backpropagation. 20 . The non-transitory machine-readable medium of claim 19 , the operations further comprising: receiving, via the data interface, a second training dataset

Assignees

Inventors

Classifications

  • Denoising; Smoothing · CPC title

  • Training; Learning · CPC title

  • Artificial neural networks [ANN] · CPC title

  • using machine learning, e.g. neural networks · CPC title

  • G06T11/60Primary

    Creating or editing images; Combining images with text · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12494004B2 cover?
Embodiments described herein provide a feedback based instructional image editing framework that employs a diffusion process to follow user instruction for image editing. A diffusion model is fine-tuned using a reward model, which may be trained via human annotation. The training of the reward model may be done by having the image editing model output a number of images, which a human annotator…
Who is the assignee on this patent?
Salesforce Inc
What technology area does this patent fall under?
Primary CPC classification G06T11/60. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 09 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).