What technology area does this patent fall under?

Primary CPC classification G06T11/60. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Aug 21 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Digital video editing based on a target digital image

US2025265752A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2025265752-A1
Application number	US-202418583067-A
Country	US
Kind code	A1
Filing date	Feb 21, 2024
Priority date	Feb 21, 2024
Publication date	Aug 21, 2025
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Digital video editing techniques are described that are based on a target digital image. In one or more implementations, inputs are received. The inputs include a target text prompt, a target digital image depicting a target object, and a source digital video having a plurality of frames depicting a source object. Regions-of-interest are identified in the plurality of frames of the source digital video, respectively, based on the target text prompt and the target digital image using a machine-learning model, e.g., a diffusion model. A plurality of frames of a target digital video are generated as having the target object using a generative machine-learning model. The generating is based on the regions-of-interest, the target digital image, the source digital video, and a source text prompt describing the source digital video.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: receiving, by a processing device, a target text prompt, a target digital image depicting a target object, and a source digital video having a plurality of frames depicting a source object; identifying, by the processing device, regions-of-interest in the plurality of frames of the source digital video, respectively, based on the target text prompt and the target digital image using a machine-learning model; generating, by the processing device, a plurality of frames of a target digital video having the target object using a generative machine-learning model, the generating based on the regions-of-interest, the target digital image, the source digital video, and a source text prompt describing the source digital video; and outputting, by the processing device, the target digital video. 2 . The method as described in claim 1 , wherein the plurality of frames of the target digital video depicts the target object as following motion exhibited by the source object in the source digital video. 3 . The method as described in claim 1 , wherein the identifying the regions-of-interest includes forming a plurality of masks defining, respectively, the regions-of-interest. 4 . The method as described in claim 3 , wherein the forming the plurality of masks is based, at least in part, on the target text prompt and the target digital image. 5 . The method as described in claim 1 , wherein the machine-learning model, utilized to perform the identifying of the regions-of-interest, is configured as one or more diffusion models. 6 . The method as described in claim 5 , wherein the one or more diffusion models include: a source denoising branch configured to process the source text prompt; and a target denoising branch configured to process the target text prompt and the target object of the target digital image. 7 . The method as described in claim 6 , wherein the identifying includes comparing noise differences as a reconstruction loss across respective timesteps between the source denoising branch and the target denoising branch. 8 . The method as described in claim 7 , wherein the identifying further comprises averaging and binarizing the noise differences to form a plurality of masks defining, respectively, the regions-of-interest. 9 . The method as described in claim 1 , wherein the generative machine-learning model, utilized to generate the plurality of frames, is configured as one or more diffusion models. 10 . The method as described in claim 1 , wherein the generating of the plurality of frames of the target digital video includes calculating a latent correction during inference involving inter-frame temporal consistency. 11 . The method as described in claim 10 , wherein the calculating includes computing inter-frame latent fields by mapping spatial locations of features between the plurality of frames of the target digital video. 12 . The method as described in claim 11 , further comprising blending the computed inter-frame latent fields at a plurality of timesteps corresponding to the plurality of frames of the target digital video. 13 . The method as described in claim 1 , wherein the generating of the plurality of frames of the target digital video includes preserving a background of the source digital video by correcting latent noise corresponding to the background based on the regions-of-interest. 14 . A computing device comprising: a processing device; and a computer-readable storage medium storing instructions that, in response to execution by the processing device, causes the processing device to perform operations including: receiving a target text prompt, a target digital image depicting a target object, a source digital video having a plurality of frames depicting a source object, and a source text prompt describing the source digital video; generating a plurality of masks defining regions-of-interest in the plurality of frames of the source digital video using a machine-learning model, the generating based on the source digital video, the target object, the target text prompt, and the source text prompt; and generating a plurality of frames of a target digital video having the target object as following motion of the source object using a generative machine-learning model based on the plurality of masks. 15 . The computing device as described in claim 14 , wherein the machine-learning model utilized to perform the generating of the plurality of masks is configured as one or more diffusion models. 16 . The computing device as described in claim 15 , wherein the generating of the plurality of masks includes comparing noise differences across respective timesteps between: a source denoising branch of the one or more diffusion models configured to process the source text prompt and frames from the source digital video; and a target denoising branch of the one or more diffusion models configured to process the target text prompt and the target object of the target digital image. 17 . The computing device as described in claim 14 , wherein the generating the plurality of frames of the target digital video is performed using a generative machine-learning model based on the regions-of-interest, the target digital image, the source digital video, and the source text prompt describing the source digital video. 18 . One or more computer-readable storage media storing instructions that, in response to execution by a processing device, causes the processing device to perform operations comprising: receiving a target text prompt, a target digital image depicting a target object, a source digital video having a plurality of frames depicting a source object, and a source text prompt describing the source digital video; generating a plurality of masks defining regions-of-interest in the plurality of frames of the source digital video; and generating a plurality of frames of a target digital video having the target object using a generative machine-learning model, the generating based on the regions-of-interest, the target digital image, the source digital video, and a source text prompt describing the source digital video. 19 . The one or more computer-readable storage media as described in claim 18 , wherein the generating a plurality of masks is performed using one or more diffusion models by comparing noise differences across respective timesteps between: a source denoising branch of the one or more diffusion models configured to process the source text prompt and frames from the source digital video; and a target denoising branch of the one or more diffusion models configured to process the target text prompt and the target object of the target digital image. 20 . The one or more computer-readable storage media as described in claim 18 , wherein the generative machine-learning model is configured as a diffusion model.

Assignees

Adobe Inc

Inventors

Classifications

G06T2207/20081
Training; Learning · CPC title
G06T2207/20084
Artificial neural networks [ANN] · CPC title
G06T5/70
Denoising; Smoothing · CPC title
G06T5/60
using machine learning, e.g. neural networks · CPC title
G06T11/60Primary
Creating or editing images; Combining images with text · CPC title

Patent family

Related publications grouped by family.

View patent family 91759080

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025265752A1 cover?: Digital video editing techniques are described that are based on a target digital image. In one or more implementations, inputs are received. The inputs include a target text prompt, a target digital image depicting a target object, and a source digital video having a plurality of frames depicting a source object. Regions-of-interest are identified in the plurality of frames of the source digit…
Who is the assignee on this patent?: Adobe Inc
What technology area does this patent fall under?: Primary CPC classification G06T11/60. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Aug 21 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).