Who is the assignee on this patent?

Beijing Baidu Netcom Sci & Tech Co Ltd

What technology area does this patent fall under?

Primary CPC classification H04N19/14. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Feb 22 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method, device, apparatus for predicting video coding complexity and storage medium

US11259029B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11259029-B2
Application number	US-202016797911-A
Country	US
Kind code	B2
Filing date	Feb 21, 2020
Priority date	May 23, 2019
Publication date	Feb 22, 2022
Grant date	Feb 22, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, device, apparatus for predicting a video coding complexity and a computer-readable storage medium are provided. The method includes: acquiring an attribute feature of a target video; extracting a plurality of first target image frames from the target video; performing a frame difference calculation on the plurality of the first target image frames, to acquire a plurality of first frame difference images; determining a histogram feature for frame difference images of the target video according to a statistical histogram of each first frame difference image; and inputting a plurality of features of the target video into a coding complexity prediction model to acquire a coding complexity prediction value of the target video. Through the above method, the BPP prediction value can be acquired intelligently.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for predicting a video coding complexity, comprising: acquiring an attribute feature of a target video; extracting a plurality of first target image frames from the target video; performing a frame difference calculation on the plurality of the first target image frames, to acquire a plurality of first frame difference images; determining a histogram feature for frame difference images of the target video according to a statistical histogram of each first frame difference image; and inputting a plurality of features of the target video into a coding complexity prediction model to acquire a coding complexity prediction value of the target video, wherein the plurality of features of the target video comprise the attribute feature of the target video and the histogram feature for frame difference images of the target video, wherein extracting the plurality of first target image frames from the target video comprises: extracting V image frames from the target video according to a preset frames per second; dividing the V image frames into N segments of image stream according to a playing sequence the V image frames in the target video; and extracting K image frames from each segment of image stream to acquire the plurality of first target image frames; wherein V, N, and K are positive integers, and K is less than V. 2. The method according to claim 1 , wherein the performing a frame difference calculation on the plurality of first target image frames, to acquire a plurality of first frame difference images comprises: performing the frame difference calculation on two adjacent image frames of the K image frames in each segment of image stream, to acquire a first frame difference image. 3. The method according to claim 1 , wherein the determining a histogram feature for frame difference images of the target video according to a statistical histogram of each first frame difference image comprises: drawing a statistical histogram for each first frame difference image to acquire a plurality of histogram vectors for respective frame difference images; and averaging the plurality of histogram vectors to acquire the histogram feature of the frame difference images. 4. The method according to claim 1 , wherein the plurality of features of the target video further comprise a transforming histogram feature, and the method further comprises: performing a discrete cosine transform for each of the first frame difference images to acquire a plurality of spectrograms for respective first frame difference images; drawing a statistical histogram for each spectrogram to acquire a plurality of histogram vectors for respective spectrograms; and averaging the plurality of histogram vectors for the respective spectrograms to acquire the transforming histogram feature. 5. The method according to claim 1 , wherein the plurality of features of the target video further comprise an image depth feature and a frame difference depth feature, and the method further comprises: inputting a plurality of second target image frames extracted from the target video into an image deep learning model; acquiring an intermediate layer result of the image deep learning model in response to an input of the plurality of frames of second target image, to acquire the image depth feature; performing a frame difference calculation on the plurality of third target image frames extracted from the target video, to acquire a plurality of second frame difference images; inputting the plurality of second frame difference images into a frame difference deep learning model; and acquiring an intermediate layer result of the frame difference deep learning model in response to an input of the plurality of second frame difference images, to acquire the frame difference depth feature. 6. The method according to claim 5 , further comprising: extracting a plurality of sample image frames from a first sample video; and training a temporal segment network by using the plurality of sample image frames as inputs and using a true value of the coding complexity of the first sample video as a target, to acquire the image deep learning model. 7. The method according to claim 5 , further comprising: extracting a plurality of sample image frames from a first sample video; performing a frame difference calculation on the plurality of sample image frames, to acquire a plurality of sample frame difference images; and training a temporal segment network by using the plurality of sample frame difference images as inputs and using a true value of the coding complexity of the first sample video as a target, to acquire the frame difference deep learning model. 8. The method according to claim 5 , wherein the inputting a plurality of second target image frames extracted from the target video into an image deep learning model comprises: extracting V image frames from the target video according to a preset frames per second, wherein V is a positive integer; dividing the V image frames into N segments of image stream according to a playing sequence of the V image frames in the target video; extracting one image frame from each segment of image stream, to acquire N second target image frames; and inputting the N second target image frames into the image deep learning model. 9. The prediction method according to claim 5 , wherein the performing a frame difference calculation on the plurality of third target image frames extracted from the target video, to acquire a plurality of second frame difference images comprises: extracting V image frames from the target video according to a preset frames per second, wherein V is a positive integer; dividing the V image frames into N segments of image stream according to a playing sequence of the V image frames in the target video; and extracting two image frames from each segment of image stream and calculating a frame difference between the two image frames of each segment of image stream, to acquire N second frame difference images. 10. The method according to claim 1 , wherein the target video comprises a second sample video; and during training the coding complexity prediction model, inputting a plurality of features of the target video into a coding complexity prediction model to acquire a coding complexity prediction value of the target video comprises: inputting features of a plurality of second sample videos into a multi-layer perception model to acquire a plurality of coding complexity prediction values of the respective second sample videos; and according to a plurality of coding complexity true values and the plurality of coding complexity prediction values, adjusting the multi-layer perception model to acquire the coding complexity prediction model. 11. The method according to claim 10 , further comprising: transcoding the second sample video according to a preset coding parameter; and calculating a coding complexity value of the transcoded second sample video to acquire the coding complexity true value. 12. A device for predicting a video coding complexity, comprising: one or more processors; and a storage device configured for storing one or more programs, wherein the one or more programs are executed by the one or more processors to enable the one or more processors to: acquire an attribute feature of a target video; extract a plurality of first target image frames from the target video; perform a frame difference calculation on the plurality of the first target image frames, to acquire a plurality of first frame difference images; determine a histogram feature for frame difference images of the target video according

Assignees

Beijing Baidu Netcom Sci & Tech Co Ltd

Inventors

Classifications

G06N3/045
Combinations of networks · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/09
Supervised learning · CPC title
H04N19/137
Motion inside a coding unit, e.g. average field, frame or block difference · CPC title
H04N19/625
using discrete cosine transform [DCT] · CPC title

Patent family

Related publications grouped by family.

View patent family 67572678

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11259029B2 cover?: A method, device, apparatus for predicting a video coding complexity and a computer-readable storage medium are provided. The method includes: acquiring an attribute feature of a target video; extracting a plurality of first target image frames from the target video; performing a frame difference calculation on the plurality of the first target image frames, to acquire a plurality of first fram…
Who is the assignee on this patent?: Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?: Primary CPC classification H04N19/14. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Feb 22 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).