What technology area does this patent fall under?

Primary CPC classification G06T11/60. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 09 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for feedback based instructional visual editing

US12494004B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12494004-B2
Application number	US-202318350876-A
Country	US
Kind code	B2
Filing date	Jul 12, 2023
Priority date	Mar 8, 2023
Publication date	Dec 9, 2025
Grant date	Dec 9, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments described herein provide a feedback based instructional image editing framework that employs a diffusion process to follow user instruction for image editing. A diffusion model is fine-tuned using a reward model, which may be trained via human annotation. The training of the reward model may be done by having the image editing model output a number of images, which a human annotator ranks based on their alignment with the original image and a given instruction.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of training a neural network based instructional image editing model, the method comprising: receiving, via a data interface, a training dataset comprising an input image, an editing instruction, and an edited image; generating a noisy latent image representation of the edited image by gradually adding noise to a latent representation of the edited image; generating, by the neural network based instructional image editing model, an estimated noise from the noisy latent image representation based on the input image and the editing instruction; computing, by a neural network based reward model, a reward score indicative of an alignment level between the edited image and the input image according to the editing instruction; computing a loss objective based on the added noise, the estimated noise, and the reward score; and training the neural network based instructional image editing model based on the computed loss objective via backpropagation. 2 . The method of claim 1 , further comprising: receiving, via the data interface, a second training dataset comprising a second input image, and a second editing instruction; generating, by the neural network based instructional image editing model, a plurality of candidate edited images based on the second input image and the second editing instruction; displaying the plurality of candidate edited images on a display; receiving an indication of a quality associated with the plurality of candidate edited images; and training the neural network based reward model based on the indication. 3 . The method of claim 2 , wherein the indication comprises a ranking of the plurality of candidate edited images. 4 . The method of claim 1 , wherein the neural network based instructional image editing model comprises a series of neural network based denoising models, wherein each neural network based denoising model generates a respective estimated noise from an input image representations, and wherein the estimated noise from the noisy latent image representation is one of the respective estimated noise. 5 . The method of claim 1 , wherein the computing the loss objective comprises weighting the loss objective based on the reward score. 6 . The method of claim 1 , further comprising: modifying the editing instruction based on the reward score. 7 . The method of claim 6 , wherein the modifying comprises appending text to the editing instruction including a value based on the reward score. 8 . The method of claim 1 , further comprising: scaling and rounding the reward score to an integer over a predetermined range of values. 9 . The method of claim 1 , wherein the generating the noisy latent image representation of the edited image comprises: encoding, via the encoder, the edited image into a latent representation of the edited image; and adding a generated noise to the latent representation of the edited image. 10 . A system for training a neural network based instructional image editing model, the system comprising: a memory that stores the neural network based instructional image editing model and a plurality of processor-executable instructions; a communication interface that receives a training dataset comprising an input image, an editing instruction, and an edited image; and one or more hardware processors that read and execute the plurality of processor-executable instructions from the memory to perform operations comprising: generating a noisy latent image representation of the edited image by gradually adding noise to a latent representation of the edited image; generating, by the neural network based instructional image editing model, an estimated noise from the noisy latent image representation based on the input image and the editing instruction; computing, by a neural network based reward model, a reward score indicative of an alignment level between the edited image and the input image according to the editing instruction; computing a loss objective based on the added noise, the estimated noise, and the reward score; and training the neural network based instructional image editing model based on the computed loss objective via backpropagation. 11 . The system of claim 10 , the operations further comprising: receiving, via the communication interface, a second training dataset comprising a second input image, and a second editing instruction; generating, by the neural network based instructional image editing model, a plurality of candidate edited images based on the second input image and the second editing instruction; displaying the plurality of candidate edited images on a display; receiving an indication of a quality associated with the plurality of candidate edited images; and training the neural network based reward model based on the indication. 12 . The system of claim 11 , wherein the indication comprises a ranking of the plurality of candidate edited images. 13 . The system of claim 10 , wherein the neural network based instructional image editing model comprises a series of neural network based denoising models, wherein each neural network based denoising model generates a respective estimated noise from an input image representations, and wherein the estimated noise from the noisy latent image representation is one of the respective estimated noise. 14 . The system of claim 10 , wherein the computing the loss objective comprises weighting the loss objective based on the reward score. 15 . The system of claim 10 , the operations further comprising: modifying the editing instruction based on the reward score. 16 . The system of claim 15 , wherein the modifying comprises appending text to the editing instruction including a value based on the reward score. 17 . The system of claim 10 , the operations further comprising: scaling and rounding the reward score to an integer over a predetermined range of values. 18 . The system of claim 10 , wherein the generating the noisy latent image representation of the edited image comprises: encoding, via the encoder, the edited image into a latent representation of the edited image; and adding a generated noise to the latent representation of the edited image. 19 . A non-transitory machine-readable medium comprising a plurality of machine-executable instructions which, when executed by one or more processors, are adapted to cause the one or more processors to perform operations comprising: receiving, via a data interface, a training dataset comprising an input image, an editing instruction, and an edited image; generating a noisy latent image representation of the edited image by gradually adding noise to a latent representation of the edited image; generating, by a neural network based instructional image editing model, an estimated noise from the noisy latent image representation based on the input image and the editing instruction; computing, by a neural network based reward model, a reward score indicative of an alignment level between the edited image and the input image according to the editing instruction; computing a loss objective based on the added noise, the estimated noise, and the reward score; and training the neural network based instructional image editing model based on the computed loss objective via backpropagation. 20 . The non-transitory machine-readable medium of claim 19 , the operations further comprising: receiving, via the data interface, a second training dataset

Assignees

Salesforce Inc

Inventors

Classifications

G06T5/70
Denoising; Smoothing · CPC title
G06T2207/20081
Training; Learning · CPC title
G06T2207/20084
Artificial neural networks [ANN] · CPC title
G06T5/60
using machine learning, e.g. neural networks · CPC title
G06T11/60Primary
Creating or editing images; Combining images with text · CPC title

Patent family

Related publications grouped by family.

View patent family 92635685

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12494004B2 cover?: Embodiments described herein provide a feedback based instructional image editing framework that employs a diffusion process to follow user instruction for image editing. A diffusion model is fine-tuned using a reward model, which may be trained via human annotation. The training of the reward model may be done by having the image editing model output a number of images, which a human annotator…
Who is the assignee on this patent?: Salesforce Inc
What technology area does this patent fall under?: Primary CPC classification G06T11/60. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 09 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Generation of story videos corresponding to user input using generative models

Generation of image corresponding to input text using multi-text guided image cropping

Personalized text-to-image generation

Frequently asked questions