What technology area does this patent fall under?

Primary CPC classification G11B27/031. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Apr 09 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for automated movie generation and editing

US2026100203A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2026100203-A1
Application number	US-202519348769-A
Country	US
Kind code	A1
Filing date	Oct 2, 2025
Priority date	Oct 3, 2024
Publication date	Apr 9, 2026
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method to edit a video includes receiving an input video including a sequence of frames and receiving an editing instruction expressed in natural language. The method also includes generating a multimodal condition based on the textual editing instruction and the input video. The multimodal condition may include an embedding of the input video concatenated with an embedding of the textual editing instruction. The method also includes applying, via a video editing model, the multimodal condition to modify visual content of the input video. The method further includes generating an edited video including visual modifications corresponding to the textual editing instruction. The edited video preserves temporal coherence and overall visual fidelity of the input video.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method to edit a video, the method comprising: receiving an input video comprising a sequence of frames; receiving an editing instruction expressed in natural language; generating a multimodal condition based on the textual editing instruction and the input video, the multimodal condition comprising an embedding of the input video concatenated with an embedding of the textual editing instruction; applying, via a video editing model, the multimodal condition to modify visual content of the input video; and generating an edited video comprising visual modifications corresponding to the textual editing instruction, the edited video preserving temporal coherence and overall visual fidelity of the input video. 2 . The method of claim 1 , wherein generating the multimodal condition comprises applying cross-attention between the embedding of the input video and the embedding of the textual editing instruction. 3 . The method of claim 1 , further comprising: generating the embedding of the input video based on encoding the sequence of frames via a temporal autoencoder. 4 . The method of claim 1 , further comprising: generating the embedding of the textual editing instruction based on encoding the instruction with a transformer-based language model. 5 . The method of claim 1 , wherein the video editing model is conditioned on a task embedding corresponding to a type of editing operation comprising one or more of object addition, object removal, background replacement, or attribute modification. 6 . The method of claim 1 , wherein generating the edited video further comprises animating newly generated content such that spatial and temporal consistency across multiple frames is preserved. 7 . The method of claim 1 , wherein preserving temporal coherence comprises aligning positional embeddings of the sequence of frames such that edits applied to a first frame are propagated to subsequent frames. 8 . The method of claim 1 , wherein the visual fidelity is preserved by applying a filtering stage configured to discard edited outputs that are leased than a predetermined quality threshold determined by automated image editing metrics. 9 . An apparatus to edit a video, comprising: one or more processors; and one or more memories coupled with the one or more processors and storing processor-executable code that, when executed by the one or more processors, is configured to cause the apparatus to: receive an input video comprising a sequence of frames; receive an editing instruction expressed in natural language; generate a multimodal condition based on the textual editing instruction and the input video, the multimodal condition comprising an embedding of the input video concatenated with an embedding of the textual editing instruction; apply, via a video editing model, the multimodal condition to modify visual content of the input video; and generate an edited video comprising visual modifications corresponding to the textual editing instruction, the edited video preserving temporal coherence and overall visual fidelity of the input video. 10 . The apparatus of claim 9 , wherein execution of the processor-executable code that causes the apparatus to generate the multimodal condition further causes the apparatus to apply cross-attention between the embedding of the input video and the embedding of the textual editing instruction. 11 . The apparatus of claim 9 , wherein execution of the processor-executable code further causes the apparatus to generate the embedding of the input video based on encoding the sequence of frames via a temporal autoencoder. 12 . The apparatus of claim 9 , wherein execution of the processor-executable code further causes the apparatus to generate the embedding of the textual editing instruction based on encoding the instruction with a transformer-based language model. 13 . The apparatus of claim 9 , wherein the video editing model is conditioned on a task embedding corresponding to a type of editing operation comprising one or more of object addition, object removal, background replacement, or attribute modification. 14 . The apparatus of claim 9 , wherein execution of the processor-executable code further that causes the apparatus to generate the edited video further causes the apparatus to animate newly generated content such that spatial and temporal consistency across multiple frames is preserved. 15 . The apparatus of claim 9 , wherein preserving temporal coherence comprises aligning positional embeddings of the sequence of frames such that edits applied to a first frame are propagated to subsequent frames. 16 . The apparatus of claim 9 , wherein the visual fidelity is preserved by applying a filtering stage configured to discard edited outputs that are leased than a predetermined quality threshold determined by automated image editing metrics. 17 . A non-transitory computer-readable medium having program code recorded thereon for editing a video, the program code executed by one or more processors and comprising: program code to receive an input video comprising a sequence of frames; program code to receive an editing instruction expressed in natural language; program code to generate a multimodal condition based on the textual editing instruction and the input video, the multimodal condition comprising an embedding of the input video concatenated with an embedding of the textual editing instruction; program code to apply, via a video editing model, the multimodal condition to modify visual content of the input video; and generate an edited video comprising visual modifications corresponding to the textual editing instruction, the edited video preserving temporal coherence and overall visual fidelity of the input video. 18 . The non-transitory computer-readable medium of claim 17 , wherein the program code to generate the multimodal condition further comprises program code to apply cross-attention between the embedding of the input video and the embedding of the textual editing instruction. 19 . The non-transitory computer-readable medium of claim 17 , wherein the program code further comprises program code to generate the embedding of the input video based on encoding the sequence of frames via a temporal autoencoder. 20 . The non-transitory computer-readable medium of claim 17 , wherein the program code further comprises program code to generate the embedding of the textual editing instruction based on encoding the instruction with a transformer-based language model.

Assignees

Meta Platforms Inc

Classifications

G06T13/40
of characters, e.g. humans, animals or virtual beings · CPC title
G06T11/60
Creating or editing images; Combining images with text · CPC title
G06V10/82
using neural networks · CPC title
G06V20/70
Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title
H04N21/816
involving special video data, e.g 3D video · CPC title

Patent family

Related publications grouped by family.

View patent family 99313413

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2026100203A1 cover?: A method to edit a video includes receiving an input video including a sequence of frames and receiving an editing instruction expressed in natural language. The method also includes generating a multimodal condition based on the textual editing instruction and the input video. The multimodal condition may include an embedding of the input video concatenated with an embedding of the textual edi…
Who is the assignee on this patent?: Meta Platforms Inc
What technology area does this patent fall under?: Primary CPC classification G11B27/031. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Apr 09 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).