Who is the assignee on this patent?

Beijing Baidu Netcom Sci & Tech Co Ltd

What technology area does this patent fall under?

Primary CPC classification G06V20/46. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 08 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Video processing method, electronic device and storage medium

US12112539B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12112539-B2
Application number	US-202117450158-A
Country	US
Kind code	B2
Filing date	Oct 6, 2021
Priority date	Nov 27, 2020
Publication date	Oct 8, 2024
Grant date	Oct 8, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A video processing method, an electronic device and a storage medium are provided, and relate to the field of artificial intelligence, and particularly relates to the fields of deep learning, model training, knowledge mapping, video processing and the like. The method includes: acquiring a plurality of first video frames, and performing fine-grained splitting on the plurality of first video frames to obtain a plurality of second video frames; performing feature encoding on the plurality of second video frames according to multi-mode information related to the plurality of second video frames, to obtain feature fusion information for characterizing fusion of the multi-mode information; and performing similarity matching on the plurality of second video frames according to the feature fusion information, and obtaining a target video according to a result of the similarity matching.

First claim

Opening claim text (preview).

What is claimed is: 1. A video processing method, comprising: acquiring a plurality of first video frames, and performing fine-grained splitting on the plurality of first video frames to obtain a plurality of second video frames; performing feature encoding on the plurality of second video frames according to multi-mode information related to the plurality of second video frames, to obtain feature fusion information for characterizing fusion of the multi-mode information; performing similarity matching on the plurality of second video frames according to the feature fusion information, and obtaining a target video according to a result of the similarity matching; identifying the multi-mode information from the plurality of second video frames according to a pre-trained first neural network model, wherein the identifying the multi-mode information from the plurality of second video frames according to the pre-trained first neural network model comprises: identifying knowledge map information according to a knowledge map extractor in the first neural network model; identifying text information according to a text extractor in the first neural network model; identifying audio information according to an audio extractor in the first neural network model; identifying hue information according to a hue extractor in the first neural network model; identifying object information according to an object extractor in the first neural network model; identifying action information according to an action extractor in the first neural network model; and wherein the multi-mode information comprises: at least one of the knowledge map information, the text information, the audio information, the hue information, the object information, and the action information; distinguishing respective types of information in the multi-mode information according to a second neural network model; identifying time sequence information related to the multi-mode information according to a third neural network model; and fusing output results of the first neural network model, the second neural network model, and the third neural network model to obtain the feature fusion information. 2. The method of claim 1 , wherein the acquiring the plurality of first video frames, and performing the fine-grained splitting on the plurality of first video frames to obtain the plurality of second video frames comprises: performing the fine-grained splitting on the plurality of first video frames according to a parameter for characterizing shot and color transformation to obtain the plurality of second video frames. 3. The method of claim 1 , wherein the performing the feature encoding on the plurality of second video frames according to the multi-mode information related to the plurality of second video frames, to obtain the feature fusion information for characterizing the fusion of the multi-mode information, comprises: performing feature extraction and feature fusion processing on the plurality of second video frames according to the multi-mode information to obtain the feature fusion information. 4. The method of claim 1 , wherein the performing the similarity matching on the plurality of second video frames according to the feature fusion information, and obtaining the target video according to the result of the similarity matching, comprises: scoring similarities of the plurality of second video frames according to the feature fusion information, and taking a result of the scoring as the result of the similarity matching; and in a case that the result of the similarity matching is that adjacent video frames for a same event content are similar, performing video merging on the adjacent video frames until completing merging of the plurality of second video frames according to the adjacent video frames, respectively, and obtaining the target video according to a result of the video merging. 5. An electronic device, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein, the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform operations of: acquiring a plurality of first video frames, and performing fine-grained splitting on the plurality of first video frames to obtain a plurality of second video frames; performing feature encoding on the plurality of second video frames according to multi-mode information related to the plurality of second video frames, to obtain feature fusion information for characterizing fusion of the multi-mode information; performing similarity matching on the plurality of second video frames according to the feature fusion information, and obtaining a target video according to a result of the similarity matching; identifying the multi-mode information from the plurality of second video frames according to a pre-trained first neural network model, wherein when the instructions are executed by the at least one processor to enable the at least one processor to identify the multi-mode information from the plurality of second video frames according to a pre-trained first neural network model, the instructions are executed by the at least one processor to enable the at least one processor to specifically perform operations of: identifying knowledge map information according to a knowledge map extractor in the first neural network model; identifying text information according to a text extractor in the first neural network model; identifying audio information according to an audio extractor in the first neural network model; identifying hue information according to a hue extractor in the first neural network model; identifying object information according to an object extractor in the first neural network model; identifying action information according to an action extractor in the first neural network model; and wherein the multi-mode information comprises: at least one of the knowledge map information, the text information, the audio information, the hue information, the object information, and the action information; distinguishing respective types of information in the multi-mode information according to a second neural network model; identifying time sequence information related to the multi-mode information according to a third neural network model; and fusing output results of the first neural network model, the second neural network model, and the third neural network model to obtain the feature fusion information. 6. The electronic device of claim 5 , wherein when the instructions are executed by the at least one processor to enable the at least one processor to acquire the plurality of first video frames, and perform the fine-grained splitting on the plurality of first video frames to obtain the plurality of second video frames, the instructions are executed by the at least one processor to enable the at least one processor to specifically perform an operation of: performing the fine-grained splitting on the plurality of first video frames according to a parameter for characterizing shot and color transformation to obtain the plurality of second video frames. 7. The electronic device of claim 5 , wherein when the instructions are executed by the at least one processor to enable the at least one processor to perform the feature encoding on the plurality of second video frames according to the multi-mode information related to the plurality of second video frames, to obtain the feature fusion information for characterizing the fusion of the multi-mode information, the instructions are executed by the at least one processor to enable the at least one processor to speci

Assignees

Beijing Baidu Netcom Sci & Tech Co Ltd

Inventors

Classifications

G06V20/49
Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes · CPC title
G06V10/806
of extracted features · CPC title
G06N3/045
Combinations of networks · CPC title
G06F18/25
Fusion techniques · CPC title
G06F18/22
Matching criteria, e.g. proximity measures · CPC title

Patent family

Related publications grouped by family.

View patent family 74809546

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12112539B2 cover?: A video processing method, an electronic device and a storage medium are provided, and relate to the field of artificial intelligence, and particularly relates to the fields of deep learning, model training, knowledge mapping, video processing and the like. The method includes: acquiring a plurality of first video frames, and performing fine-grained splitting on the plurality of first video fra…
Who is the assignee on this patent?: Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06V20/46. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 08 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Method for processing audio and video information, electronic device and storage medium

System and method for calibrating moving camera capturing broadcast video

Partitioning videos

Partitioning videos

Video segmentation techniques

Frequently asked questions