Who is the assignee on this patent?

Baidu online network technology beijing co ltd

What technology area does this patent fall under?

Primary CPC classification G06V20/47. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 12 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method and apparatus for extracting video preview, device and computer storage medium

US11302103B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11302103-B2
Application number	US-201816181046-A
Country	US
Kind code	B2
Filing date	Nov 5, 2018
Priority date	Nov 28, 2017
Publication date	Apr 12, 2022
Grant date	Apr 12, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure provides a method and apparatus for extracting a video preview, a device and a computer storage medium. The method comprises: inputting a video into a video classification model obtained by pre-training; obtaining weights of respective video frames output by an attention module in the video classification model; extracting continuous N video frames whose total weight value satisfies a preset requirement, as the video preview of the target video, N being a preset positive integer. It is possible to, in the manner provided by the present disclosure, automatically extract continuous video frames from the video as the video preview, without requiring manual clipping, and with manpower costs being reduced.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for extracting a video preview, wherein the method comprises: inputting a video into a video classification model obtained by pre-training, wherein the video classification model comprises a convolutional neural network, a time sequence neural network and an attention module; extracting convolutional features of respective frames of the video by the convolutional neural network; outputting time sequence features of the respective frames by the time sequence neural network according to the convolutional features of the respective frames; obtaining weights of the respective frames output by the attention module according to the time sequence features of the respective frames; and extracting continuous N frames of the video whose total weight value satisfies a preset requirement, as the video preview of the target video, N being a preset positive integer. 2. The method according to claim 1 , wherein the video classification model further comprises a fully-connected layer. 3. The method according to claim 2 , wherein the time sequence neural network comprises: a Short-Term Memory, a Recurrent Neural Network RNN or a Gated Recurrent Unit GRU. 4. The method according to claim 1 , wherein the preset requirement comprises: a total weight value is the largest; or larger than or equal to a preset weight threshold. 5. The method according to claim 1 , wherein the video classification model is trained by taking a training video whose video class is pre-annotated as input of the video classification model and by taking the corresponding video class as output of the video classification model, to minimize a loss function of a classification result. 6. The method according to claim 5 , wherein during the training process of the video classification model, taking the training video as input of the convolutional neural network to output convolutional features of respective frames of the training video; taking the convolutional features of the respective frames of the training video as input of the time sequence neural network, to output time sequence features of the respective frames of the training video; taking the time sequence features of the respective frames of the training video as input of the attention module to output weights of the respective frames of the training video; mapping to a video class at a fully-connected layer according to weights of respective frames and output of the time sequence neural network; using a mapping result to calculate a loss function. 7. The method according to claim 1 , wherein the method further comprises: if a target video is located on a page, displaying a video preview of the target video. 8. The method according to claim 7 , wherein the locating the target video comprises: locating a video at a target position in a video feed page; or locating a video at a target position in a video aggregation page. 9. The method according to claim 7 , wherein the displaying a video preview of the target video comprises: after locating the target video, automatically playing the video preview of the target video; or playing the video preview of the target video after detecting an event that that the user triggers the play of the video preview. 10. The method according to claim 9 , wherein during displaying a video preview of the target video, displaying prompt information that the video preview is being played. 11. The method according to claim 7 , wherein the method further comprises: playing the target video after detecting an event that the user triggers the play of the target video. 12. A device, wherein the device comprises: one or more processors, a storage for storing one or more programs, the one or more programs, when executed by said one or more processors, enable said one or more processors to implement a method for extracting a video preview, wherein the method comprises: inputting a video into a video classification model obtained by pre-training, wherein the video classification model comprises a convolutional neural network, a time sequence neural network and an attention module; extracting convolutional features of respective frames of the video by the convolutional neural network; outputting time sequence features of the respective frames by the time sequence neural network according to the convolutional features of the respective frames; obtaining weights of the respective frames output by the attention module according to the time sequence features of the respective frames; and extracting continuous N frames of the video whose total weight value satisfies a preset requirement, as the video preview of the target video, N being a preset positive integer. 13. The device according to claim 12 , wherein the video classification model further comprises a fully-connected layer. 14. The device according to claim 13 , wherein the time sequence neural network comprises: a Short-Term Memory, a Recurrent Neural Network RNN or a Gated Recurrent Unit GRU. 15. The device according to claim 12 , wherein the preset requirement comprises: a total weight value is the largest; or larger than or equal to a preset weight threshold. 16. The device according to claim 12 , wherein the video classification model is trained by taking a training video whose video class is pre-annotated as input of the video classification model and by taking the corresponding video class as output of the video classification model, to minimize a loss function of a classification result. 17. The device according to claim 16 , wherein during the training process of the video classification model, taking the training video as input of the convolutional neural network to output convolutional features of respective frames of the training video; taking the convolutional features of the respective frames of the training video as input of the time sequence neural network, to output time sequence features of the respective frames of the training video; taking the time sequence features of the respective frames of the training video as input of the attention module to output weights of the respective frames of the training video; mapping to a video class at a fully-connected layer according to weights of respective frames and output of the time sequence neural network; using a mapping result to calculate a loss function. 18. The device according to claim 12 , wherein the method further comprises: if a target video is located on a page, displaying a video preview of the target video. 19. The device according to claim 18 , wherein the locating the target video comprises: locating a video at a target position in a video feed page; or locating a video at a target position in a video aggregation page. 20. A non-transitory computer-readable storage medium including a computer-executable instruction which, when executed by a computer processor, executes a method for extracting a video preview, wherein the method comprises: inputting a video into a video classification model obtained by pre-training, wherein the video classification model comprises a convolutional neural network, a time sequence neural network and an attention module; extracting convolutional features of respective frames of the video by the convolutional neural network; outputting time sequence features of the respective frames by the time sequence neural network according to the convolutional features of the respective frames; obtaining weights of the respective frames output by the attention m

Assignees

Baidu online network technology beijing co ltd

Inventors

Classifications

G06V10/82
using neural networks · CPC title
G06V10/764
using classification, e.g. of video objects · CPC title
G06V20/47Primary
Detecting features for summarising video content · CPC title
G06V10/454Primary
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
G06F18/241
relating to the classification model, e.g. parametric or non-parametric approaches · CPC title

Patent family

Related publications grouped by family.

View patent family 62033662

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11302103B2 cover?: The present disclosure provides a method and apparatus for extracting a video preview, a device and a computer storage medium. The method comprises: inputting a video into a video classification model obtained by pre-training; obtaining weights of respective video frames output by an attention module in the video classification model; extracting continuous N video frames whose total weight valu…
Who is the assignee on this patent?: Baidu online network technology beijing co ltd
What technology area does this patent fall under?: Primary CPC classification G06V20/47. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 12 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Method and system of determining object positions for image processing using wireless network angle of transmission

Generating structured output predictions using neural networks

Computerized machine learning of interesting video sections

Frequently asked questions