Who is the assignee on this patent?

Beijing Zitiao Network Technology Co Ltd

What technology area does this patent fall under?

Primary CPC classification G06T11/60. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Feb 19 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Video editing method, device, and medium

US2026051096A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2026051096-A1
Application number	US-202519299718-A
Country	US
Kind code	A1
Filing date	Aug 14, 2025
Priority date	Aug 15, 2024
Publication date	Feb 19, 2026
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present disclosure provide a video editing method, apparatus, device, medium and program product. The method includes: inputting an original video into a script generation model that is pre-trained; generating, by the script generation model, a video feature sequence according to video frames in the original video, mapping the video feature sequence to a text feature space of the script generation mode, and obtaining a video mapping feature sequence; generating, by the script generation model, a second video script of the original video based on the video mapping feature sequence, wherein the second video script includes a timestamp; adding the second video script to the original video according to the timestamp to obtain a target video

First claim

Opening claim text (preview).

1 . A video editing method, comprising: inputting an original video into a script generation model that is pre-trained, wherein the script generation model is trained based on video samples, and first video scripts of the video samples satisfy a preset selection condition; generating, by the script generation model, a video feature sequence according to video frames in the original video, mapping the video feature sequence to a text feature space of the script generation mode, and obtaining a video mapping feature sequence; generating, by the script generation model, a second video script of the original video based on the video mapping feature sequence, wherein the second video script comprises a timestamp; adding the second video script to the original video according to the timestamp to obtain a target video. 2 . The method according to claim 1 , wherein the generating, by the script generation model, a second video script of the original video based on the video mapping feature sequence comprises: inputting prompt information corresponding to the original video into the script generation model; generating, by the script generation model, the second video script of the original video based on the prompt information and the video mapping feature sequence. 3 . The method according to claim 2 , wherein the script generation model comprises a script generation module, and parameters of the script generation module are updated during training process of the script generation model; the generating, by the script generation model, the second video script of the original video based on the prompt information and the video mapping feature sequence comprises: generating, by the script generation module, the second video script of the original video based on the video mapping feature sequence under constraint of the prompt information, wherein the prompt information comprises script attribute prompt information and/or script content prompt information. 4 . The method according to claim 2 , wherein a training method of the script generation model comprises: obtaining the video samples, wherein the first video scripts of the video samples satisfy the preset selection condition; determining a script sample according to audio information of video frames in the video samples, and generating a prompt information sample based on attribute information and content information of the script sample; training a script generation model to be trained based on the video frames in the video samples, the prompt information sample, and the script sample, so that the script generation model to be trained learns a mapping relationship among the video frames in the video samples, the prompt information sample, and the script sample. 5 . The method according to claim 4 , wherein the training a script generation model to be trained based on the video frames in the video samples, the prompt information sample, and the script sample comprises: inputting the video frames in the video samples and corresponding prompt information samples into the script generation model to be trained, and obtaining a predicted script output by the script generation model to be trained; calculating a loss value between the predicted script and the script sample; in response to the loss value not satisfying a model training end condition, adopting a backpropagation method to adjust model parameters of the script generation model to be trained. 6 . The method according to claim 1 , wherein the script generation model comprises a visual encoder and an adapter, and parameters of the adapter are updated during training of the script generation model; the generating, by the script generation model, a video feature sequence according to video frames in the original video, mapping the video feature sequence to a text feature space of the script generation mode, and obtaining a video mapping feature sequence comprises: compressing the video frames in the original video by the visual encoder to obtain a feature vector, and generating the video feature sequence according to the feature vector corresponding to the video frames; mapping the video feature sequence to the text feature space by the adapter to obtain the video mapping feature sequence. 7 . The method according to claim 1 , wherein the adding the second video script to the original video according to the timestamp to obtain a target video comprises: determining a video frame corresponding to the second video script according to a timestamp corresponding to the second video script, adding the second video script to the video frame corresponding to the second video script, and obtaining the target video. 8 . An electronic device, comprising: one or more processors; a storage apparatus, configured to store one or more programs, wherein when the one or more programs are executed by the one or more processor, the one or more processors are caused to implement a video editing method, and the method comprises: inputting an original video into a script generation model that is pre-trained, wherein the script generation model is trained based on video samples, and first video scripts of the video samples satisfy a preset selection condition; generating, by the script generation model, a video feature sequence according to video frames in the original video, mapping the video feature sequence to a text feature space of the script generation mode, and obtaining a video mapping feature sequence; generating, by the script generation model, a second video script of the original video based on the video mapping feature sequence, wherein the second video script comprises a timestamp; adding the second video script to the original video according to the timestamp to obtain a target video. 9 . The electronic device according to claim 8 , wherein the generating, by the script generation model, a second video script of the original video based on the video mapping feature sequence comprises: inputting prompt information corresponding to the original video into the script generation model; generating, by the script generation model, the second video script of the original video based on the prompt information and the video mapping feature sequence. 10 . The electronic device according to claim 9 , wherein the script generation model comprises a script generation module, and parameters of the script generation module are updated during training process of the script generation model; the generating, by the script generation model, the second video script of the original video based on the prompt information and the video mapping feature sequence comprises: generating, by the script generation module, the second video script of the original video based on the video mapping feature sequence under constraint of the prompt information, wherein the prompt information comprises script attribute prompt information and/or script content prompt information. 11 . The electronic device according to claim 9 , wherein a training method of the script generation model comprises: obtaining the video samples, wherein the first video scripts of the video samples satisfy the preset selection condition; determining a script sample according to audio information of video frames in the video samples, and generating a prompt information sample based on attribute information and content information of the script sample; training a script generation model to be trained based on the video frames in the video samples, the prompt information sample, and the script sample, so that the script generation model to be trained learns a mapping relationship among the video frames in the video samples, the prompt

Assignees

Beijing Zitiao Network Technology Co Ltd

Inventors

Classifications

G06T11/60Primary
Creating or editing images; Combining images with text · CPC title
G06V10/7715
Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title
G06N3/08Primary
Learning methods · CPC title
H04N21/84
Generation or processing of descriptive data, e.g. content descriptors {(systems specially adapted for using meta-information in broadcast systems H04H60/73)} · CPC title
H04N21/435
Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream · CPC title

Patent family

Related publications grouped by family.

View patent family 93174912

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2026051096A1 cover?: Embodiments of the present disclosure provide a video editing method, apparatus, device, medium and program product. The method includes: inputting an original video into a script generation model that is pre-trained; generating, by the script generation model, a video feature sequence according to video frames in the original video, mapping the video feature sequence to a text feature space of…
Who is the assignee on this patent?: Beijing Zitiao Network Technology Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06T11/60. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Feb 19 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).