What technology area does this patent fall under?

Primary CPC classification G11B27/031. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Apr 03 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Video editing using image diffusion

US2025111866A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2025111866-A1
Application number	US-202318479626-A
Country	US
Kind code	A1
Filing date	Oct 2, 2023
Priority date	Oct 2, 2023
Publication date	Apr 3, 2025
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments are disclosed for editing video using image diffusion. The method may include receiving an input video depicting a target and a prompt including an edit to be made to the target. A keyframe associated with the input video is then identified. The keyframe is edited, using a generative neural network, based on the prompt to generate an edited keyframe. A subsequent frame of the input video is edited using the generative neural network, based on the prompt, features of the edited keyframe, and features of an intervening frame to generate an edited output video.

First claim

Opening claim text (preview).

We claim: 1 . A method comprising: receiving an input video depicting a target and a prompt including an edit to be made to the target; identifying a keyframe associated with the input video; editing the keyframe, using an image generation model, based on the prompt to generate an edited keyframe; and editing a subsequent frame of the input video using the image generation model, based on the prompt, features of the edited keyframe, and features of an intervening frame to generate an edited output video. 2 . The method of claim 1 wherein the image generation model includes a U-Net architecture. 3 . The method of claim 2 , wherein editing a subsequent frame of the input video using the image generation model, based on the prompt, features of the edited keyframe, and features of an intervening frame to generate an edited output video, further comprises: injecting features from a self-attention block of the image generation model obtained from processing the keyframe into the self-attention block of the image generation model while processing the subsequent frame. 4 . The method of claim 3 , wherein editing a subsequent frame of the input video using the image generation model, based on the prompt, features of the edited keyframe, and features of an intervening frame to generate an edited output video, further comprises: injecting features from a self-attention block of the image generation model obtained from processing an immediately preceding frame into the self-attention block of the image generation model while processing the subsequent frame. 5 . The method of claim 3 , wherein the features are injected into the self-attention block of a decoder of the U-Net architecture. 6 . The method of claim 3 , wherein the features further include depth features obtained by passing the intervening frame through a depth model, wherein the depth model is a machine learning model trained to generate a depth map for an input image. 7 . The method of claim 1 , wherein editing a subsequent frame of the input video using the image generation model, based on the prompt, features of the edited keyframe, and features of an intervening frame to generate an edited output video, further comprises: updating a latent space representation of the subsequent frame using a latent space representation of an immediately preceding frame for a first number of diffusion steps. 8 . The method of claim 1 , further comprising: identifying a new keyframe and processing a second set of frames subsequent to the new keyframe using its features. 9 . A non-transitory computer-readable medium storing executable instructions, which when executed by a processing device, cause the processing device to perform operations comprising: receiving an input video depicting a target and a prompt including an edit to be made to the target; identifying a keyframe associated with the input video; editing the keyframe, using an image generation model, based on the prompt to generate an edited keyframe; and editing a subsequent frame of the input video using the image generation model, based on the prompt, features of the edited keyframe, and features of an intervening frame to generate an edited output video. 10 . The non-transitory computer-readable medium of claim 9 wherein the image generation model includes a U-Net architecture. 11 . The non-transitory computer-readable medium of claim 10 , wherein the operation of editing a subsequent frame of the input video using the image generation model, based on the prompt, features of the edited keyframe, and features of an intervening frame to generate an edited output video, further comprises: injecting features from a self-attention block of the image generation model obtained from processing the keyframe into the self-attention block of the image generation model while processing the subsequent frame. 12 . The non-transitory computer-readable medium of claim 11 , wherein the operation of editing a subsequent frame of the input video using the image generation model, based on the prompt, features of the edited keyframe, and features of an intervening frame to generate an edited output video, further comprises: injecting features from a self-attention block of the image generation model obtained from processing an immediately preceding frame into the self-attention block of the image generation model while processing the subsequent frame. 13 . The non-transitory computer-readable medium of claim 11 , wherein the features are injected into the self-attention block of a decoder of the U-Net architecture. 14 . The non-transitory computer-readable medium of claim 11 , wherein the features further include depth features obtained by passing the intervening frame through a depth model, wherein the depth model is a machine learning model trained to generate a depth map for an input image. 15 . The non-transitory computer-readable medium of claim 9 , wherein the operation of editing a subsequent frame of the input video using the image generation model, based on the prompt, features of the edited keyframe, and features of an intervening frame to generate an edited output video, further comprises: updating a latent space representation of the subsequent frame using a latent space representation of an immediately preceding frame for a first number of diffusion steps. 16 . The non-transitory computer-readable medium of claim 9 , further comprising: identifying a new keyframe and processing a second set of frames subsequent to the new keyframe using its features. 17 . A system comprising: a memory component; and a processing device coupled to the memory component, the processing device to perform operations comprising: receiving a request to edit a video, the request including a digital video and a text prompt describing the edit; generating an edited video using an image diffusion model, wherein feature injection is used for appearance consistency and guided latent updates are used for temporal consistency; and returning the edited video. 18 . The system of claim 17 , wherein the operation of generating an edited video using an image diffusion model, wherein feature injection is used for appearance consistency and guided latent updates are used for temporal consistency further comprises: identifying a keyframe associated with the video; editing the keyframe, using the image diffusion model, based on the text prompt to generate an edited keyframe; and editing subsequent frames of the video using the image diffusion model, based on the prompt, features of the edited keyframe, and features of intervening frames to generate the edited video. 19 . The system of claim 18 , wherein the operation of editing subsequent frames of the video using the image diffusion model, based on the prompt, features of the edited keyframe, and features of intervening frames to generate the edited video, further comprises: injecting features from a self-attention block of a decoder of a U-Net architecture of the image diffusion model obtained from processing the keyframe into the self-attention block of the image diffusion model while processing the subsequent frames. 20 . The system of claim 18 , wherein the operation of editing subsequent frames of the video using the image diffusion model, based on the prompt, features of the edited keyframe, and features of intervening frames to generate the edited video, further comprises: updating a latent space representation of the subsequent fra

Assignees

Adobe Inc

Inventors

Classifications

G06T11/00
Two-dimensional [2D] image generation · CPC title
G11B27/031Primary
Electronic editing of digitised analogue information signals, e.g. audio or video signals · CPC title
G06T7/50
Depth or shape recovery · CPC title
G06T2207/10016
Video; Image sequence · CPC title

Patent family

Related publications grouped by family.

View patent family 95155420

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025111866A1 cover?: Embodiments are disclosed for editing video using image diffusion. The method may include receiving an input video depicting a target and a prompt including an edit to be made to the target. A keyframe associated with the input video is then identified. The keyframe is edited, using a generative neural network, based on the prompt to generate an edited keyframe. A subsequent frame of the input …
Who is the assignee on this patent?: Adobe Inc
What technology area does this patent fall under?: Primary CPC classification G11B27/031. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Apr 03 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Methods and apparatus for augmenting dense depth maps using sparse data

Searching for images using generated images

Utilizing a diffusion prior neural network for text guided digital image editing

Text-Based Real Image Editing with Diffusion Models

Storing entries in and retrieving information from an episodic object memory

Multi-dimensional generative framework for video generation

Video generation with latent diffusion models

Frequently asked questions