Video editing method, device, and medium

US2026051096A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2026051096-A1
Application numberUS-202519299718-A
CountryUS
Kind codeA1
Filing dateAug 14, 2025
Priority dateAug 15, 2024
Publication dateFeb 19, 2026
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present disclosure provide a video editing method, apparatus, device, medium and program product. The method includes: inputting an original video into a script generation model that is pre-trained; generating, by the script generation model, a video feature sequence according to video frames in the original video, mapping the video feature sequence to a text feature space of the script generation mode, and obtaining a video mapping feature sequence; generating, by the script generation model, a second video script of the original video based on the video mapping feature sequence, wherein the second video script includes a timestamp; adding the second video script to the original video according to the timestamp to obtain a target video

First claim

Opening claim text (preview).

1 . A video editing method, comprising: inputting an original video into a script generation model that is pre-trained, wherein the script generation model is trained based on video samples, and first video scripts of the video samples satisfy a preset selection condition; generating, by the script generation model, a video feature sequence according to video frames in the original video, mapping the video feature sequence to a text feature space of the script generation mode, and obtaining a video mapping feature sequence; generating, by the script generation model, a second video script of the original video based on the video mapping feature sequence, wherein the second video script comprises a timestamp; adding the second video script to the original video according to the timestamp to obtain a target video. 2 . The method according to claim 1 , wherein the generating, by the script generation model, a second video script of the original video based on the video mapping feature sequence comprises: inputting prompt information corresponding to the original video into the script generation model; generating, by the script generation model, the second video script of the original video based on the prompt information and the video mapping feature sequence. 3 . The method according to claim 2 , wherein the script generation model comprises a script generation module, and parameters of the script generation module are updated during training process of the script generation model; the generating, by the script generation model, the second video script of the original video based on the prompt information and the video mapping feature sequence comprises: generating, by the script generation module, the second video script of the original video based on the video mapping feature sequence under constraint of the prompt information, wherein the prompt information comprises script attribute prompt information and/or script content prompt information. 4 . The method according to claim 2 , wherein a training method of the script generation model comprises: obtaining the video samples, wherein the first video scripts of the video samples satisfy the preset selection condition; determining a script sample according to audio information of video frames in the video samples, and generating a prompt information sample based on attribute information and content information of the script sample; training a script generation model to be trained based on the video frames in the video samples, the prompt information sample, and the script sample, so that the script generation model to be trained learns a mapping relationship among the video frames in the video samples, the prompt information sample, and the script sample. 5 . The method according to claim 4 , wherein the training a script generation model to be trained based on the video frames in the video samples, the prompt information sample, and the script sample comprises: inputting the video frames in the video samples and corresponding prompt information samples into the script generation model to be trained, and obtaining a predicted script output by the script generation model to be trained; calculating a loss value between the predicted script and the script sample; in response to the loss value not satisfying a model training end condition, adopting a backpropagation method to adjust model parameters of the script generation model to be trained. 6 . The method according to claim 1 , wherein the script generation model comprises a visual encoder and an adapter, and parameters of the adapter are updated during training of the script generation model; the generating, by the script generation model, a video feature sequence according to video frames in the original video, mapping the video feature sequence to a text feature space of the script generation mode, and obtaining a video mapping feature sequence comprises: compressing the video frames in the original video by the visual encoder to obtain a feature vector, and generating the video feature sequence according to the feature vector corresponding to the video frames; mapping the video feature sequence to the text feature space by the adapter to obtain the video mapping feature sequence. 7 . The method according to claim 1 , wherein the adding the second video script to the original video according to the timestamp to obtain a target video comprises: determining a video frame corresponding to the second video script according to a timestamp corresponding to the second video script, adding the second video script to the video frame corresponding to the second video script, and obtaining the target video. 8 . An electronic device, comprising: one or more processors; a storage apparatus, configured to store one or more programs, wherein when the one or more programs are executed by the one or more processor, the one or more processors are caused to implement a video editing method, and the method comprises: inputting an original video into a script generation model that is pre-trained, wherein the script generation model is trained based on video samples, and first video scripts of the video samples satisfy a preset selection condition; generating, by the script generation model, a video feature sequence according to video frames in the original video, mapping the video feature sequence to a text feature space of the script generation mode, and obtaining a video mapping feature sequence; generating, by the script generation model, a second video script of the original video based on the video mapping feature sequence, wherein the second video script comprises a timestamp; adding the second video script to the original video according to the timestamp to obtain a target video. 9 . The electronic device according to claim 8 , wherein the generating, by the script generation model, a second video script of the original video based on the video mapping feature sequence comprises: inputting prompt information corresponding to the original video into the script generation model; generating, by the script generation model, the second video script of the original video based on the prompt information and the video mapping feature sequence. 10 . The electronic device according to claim 9 , wherein the script generation model comprises a script generation module, and parameters of the script generation module are updated during training process of the script generation model; the generating, by the script generation model, the second video script of the original video based on the prompt information and the video mapping feature sequence comprises: generating, by the script generation module, the second video script of the original video based on the video mapping feature sequence under constraint of the prompt information, wherein the prompt information comprises script attribute prompt information and/or script content prompt information. 11 . The electronic device according to claim 9 , wherein a training method of the script generation model comprises: obtaining the video samples, wherein the first video scripts of the video samples satisfy the preset selection condition; determining a script sample according to audio information of video frames in the video samples, and generating a prompt information sample based on attribute information and content information of the script sample; training a script generation model to be trained based on the video frames in the video samples, the prompt information sample, and the script sample, so that the script generation model to be trained learns a mapping relationship among the video frames in the video samples, the prompt

Assignees

Inventors

Classifications

  • G06T11/60Primary

    Creating or editing images; Combining images with text · CPC title

  • Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • Generation or processing of descriptive data, e.g. content descriptors {(systems specially adapted for using meta-information in broadcast systems H04H60/73)} · CPC title

  • Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2026051096A1 cover?
Embodiments of the present disclosure provide a video editing method, apparatus, device, medium and program product. The method includes: inputting an original video into a script generation model that is pre-trained; generating, by the script generation model, a video feature sequence according to video frames in the original video, mapping the video feature sequence to a text feature space of…
Who is the assignee on this patent?
Beijing Zitiao Network Technology Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06T11/60. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Feb 19 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).