What technology area does this patent fall under?

Primary CPC classification H04N7/002. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Feb 25 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Predicting video edits from text-based conversations using neural networks

US12238451B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12238451-B2
Application number	US-202218055301-A
Country	US
Kind code	B2
Filing date	Nov 14, 2022
Priority date	Nov 14, 2022
Publication date	Feb 25, 2025
Grant date	Feb 25, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments are disclosed for predicting, using neural networks, editing operations for application to a video sequence based on processing conversational messages by a video editing system. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input including a video sequence and text sentences, the text sentences describing a modification to the video sequence, mapping, by a first neural network content of the text sentences describing the modification to the video sequence to a candidate editing operation, processing, by a second neural network, the video sequence to predict parameter values for the candidate editing operation, and generating a modified video sequence by applying the candidate editing operation with the predicted parameter values to the video sequence.

First claim

Opening claim text (preview).

We claim: 1. A computer-implemented method comprising: receiving an input including a video sequence and text sentences, the text sentences describing a modification to the video sequence; mapping, by a first neural network, content of the text sentences describing the modification to the video sequence to a candidate video editing operation; processing, by a second neural network, the video sequence to predict parameter values for the candidate video editing operation; and generating a modified video sequence by applying the candidate video editing operation with the predicted parameter values to the video sequence. 2. The computer-implemented method of claim 1 , wherein mapping the content of the text sentences describing the modification to the video sequence to the candidate video editing operation comprises: mapping the content of the text sentences to a reference sentence; and identifying a video editing operation associated with the reference sentence as the candidate video editing operation. 3. The computer-implemented method of claim 2 , wherein mapping the content of the text sentences to the reference sentence comprises: generating, by a sentence transformer, sentence features for the text sentences; calculating cosine similarity values between the sentence features for the text sentences and reference sentence features for reference sentences, wherein each reference sentence of the reference sentences is associated with a video editing operation; and identifying the reference sentence having a highest calculated cosine similarity with the sentence features for the text sentences. 4. The computer-implemented method of claim 1 , wherein processing the video sequence to predict the parameter values for the candidate video editing operation comprises: for each frame of the video sequence: generating an RGB feature vector and an optical flow feature vector, concatenating the RGB feature vector and the optical flow feature vector to create a concatenated feature vector, and passing the concatenated feature vector through an editing parameters prediction network to predict the parameter values for the candidate video editing operation. 5. The computer-implemented method of claim 4 , wherein the predicted parameter values for the candidate video editing operation include mean parameter values and standard deviation parameter values. 6. The computer-implemented method of claim 1 , further comprising: receiving a second input including second text sentences, the second text sentences describing a second modification to trim the video sequence by a first amount of time; detecting shot boundaries within the video sequence; determining that an end time of a shot boundary is within the first amount of time; and trimming a second amount of time from the video sequence starting at the end time of the shot boundary, wherein the second amount of time is different from the first amount of time. 7. The computer-implemented method of claim 1 , wherein generating the modified video sequence by applying the candidate video editing operation with the predicted parameter values for the candidate video editing operation to the video sequence comprises: adjusting a brightness parameter in response to mapping the text sentences to a brightness editing operation. 8. The computer-implemented method of claim 1 , further comprising: receiving a second input including a second video sequence; processing, by the second neural network, the second video sequence to predict parameter values for one or more video editing operations; and generating a modified second video sequence by applying the one or more video editing operations with the predicted parameter values to the second video sequence. 9. A non-transitory computer-readable storage medium storing executable instructions, which when executed by a processing device, cause the processing device to perform operations comprising: receiving an input including a video sequence and text sentences, the text sentences describing a modification to the video sequence; mapping, by a first neural network, content of the text sentences describing the modification to the video sequence to a candidate video editing operation; processing, by a second neural network, the video sequence to predict parameter values for the candidate video editing operation; and generating a modified video sequence by applying the candidate video editing operation with the predicted parameter values to the video sequence. 10. The non-transitory computer-readable storage medium of claim 9 , wherein to map the content of the text sentences describing the modification to the video sequence to the candidate video editing operation the instructions further cause the processing device to perform operations comprising: mapping the content of the text sentences to a reference sentence; and identifying a video editing operation associated with the reference sentence as the candidate video editing operation. 11. The non-transitory computer-readable storage medium of claim 10 , wherein to map the content of the text sentences to the reference sentence the instructions further cause the processing device to perform operations comprising: generating, by a sentence transformer, sentence features for the text sentences; calculating cosine similarity values between the sentence features for the text sentences and reference sentence features for reference sentences, wherein each reference sentence of the reference sentences is associated with a video editing operation; and identifying the reference sentence having a highest calculated cosine similarity with the sentence features for the text sentences. 12. The non-transitory computer-readable storage medium of claim 9 , wherein to process the video sequence to predict the parameter values for the candidate video editing operation the instructions further cause the processing device to perform operations comprising: for each frame of the video sequence: generating an RGB feature vector and an optical flow feature vector, concatenating the RGB feature vector and the optical flow feature vector to create a concatenated feature vector, and passing the concatenated feature vector through an editing parameters prediction network to predict the parameter values for the candidate video editing operation. 13. The non-transitory computer-readable storage medium of claim 12 , wherein the predicted parameter values for the candidate video editing operation include mean parameter values and standard deviation parameter values. 14. The non-transitory computer-readable storage medium of claim 9 , wherein the instructions further cause the processing device to perform operations comprising: receiving a second input including second text sentences, the second text sentences describing a second modification to trim the video sequence by a first amount of time; detecting shot boundaries within the video sequence; determining that an end time of a shot boundary is within the first amount of time; and trimming a second amount of time from the video sequence starting at the end time of the shot boundary, wherein the second amount of time is different from the first amount of time. 15. The non-transitory computer-readable storage medium of claim 9 , wherein to generate the modified video sequence by applying the candidate video editing operation with the predicted parameter values for the candidate video editing operation to the video sequence the instructions further cause the processing device to perform operations comprising: adjusting a brightness

Assignees

Adobe Inc

Inventors

Classifications

G06T11/60
Creating or editing images; Combining images with text · CPC title
G06N3/08
Learning methods · CPC title
G11B27/031
Electronic editing of digitised analogue information signals, e.g. audio or video signals · CPC title
H04N7/002Primary
Special television systems not provided for by H04N7/007 - H04N7/18 (still pictures via a television channel H04N1/00098) · CPC title
G06N3/045Primary
Combinations of networks · CPC title

Patent family

Related publications grouped by family.

View patent family 91027705

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12238451B2 cover?: Embodiments are disclosed for predicting, using neural networks, editing operations for application to a video sequence based on processing conversational messages by a video editing system. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input including a video sequence and text sentences, the text sentences describing a modification to the vi…
Who is the assignee on this patent?: Adobe Inc
What technology area does this patent fall under?: Primary CPC classification H04N7/002. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Feb 25 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Variable length phrase predictions

Voice adaptation using synthetic speech processing

Generating segmentation masks for objects in digital videos using pose tracking data

Highlight video generated with adaptable multimodal customization

Content modification in a shared session among multiple head-mounted display devices

Enhancing media content effectiveness using feedback between evaluation and content editing

Attention-driven image manipulation

Frequently asked questions