Video classification system, video classification method, and neural network training system

US12322176B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12322176-B2
Application numberUS-202217734384-A
CountryUS
Kind codeB2
Filing dateMay 2, 2022
Priority dateNov 17, 2021
Publication dateJun 3, 2025
Grant dateJun 3, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A video classification system and method and a neural network training system are provided. The video classification system captures multiple sampled images, encodes first and second images of all sampled images into feature matrices of the sampled images using a convolutional neural network module, and obtains the classification of the videos based on a recurrent neural network module and the feature matrices. The neural network training system captures multiple first sampled images and uses first training samples of all the first sampled images to train the convolutional neural network module and a classification module to obtain multiple parameter values of the convolutional neural network module. The neural network training system trains the recurrent neural network module based on the parameter values and multiple second sampled images.

First claim

Opening claim text (preview).

What is claimed is: 1. A video classification system, comprising: a processor, configured to obtain a video; a convolutional neural network module, having a plurality of trained first parameters; and a recurrent neural network module, having a plurality of trained second parameters, wherein the processor is configured to perform the following steps: (a) selecting a present time point according to a time interval, and sampling the video according to the present time point to obtain a sampled image at the present time point; (b) adjusting a pixel size of the sampled image at the present time point to obtain a corresponding first image, wherein a pixel size of the first image is a first pixel size, and the first pixel size of the first image is smaller than the pixel size of the sampled image; (c) performing image cropping on the sampled image at the present time point to obtain at least one corresponding partial image, and obtaining a corresponding second image based on the at least one partial image, wherein a pixel size of the second image is the first pixel size; (d) using the convolutional neural network module to encode the first image and the second image of the sampled image at the present time point into a feature vector corresponding to the sampled image at the present time point; (e) sequentially merging the feature vector corresponding to the present time point with a plurality of past feature vectors corresponding to a plurality of past time points into a feature matrix; and (f) obtaining a classification of the video based on the recurrent neural network module and the feature matrix. 2. The video classification system according to claim 1 , wherein the processor performs image cropping at a corresponding position of the sampled image to obtain a cropped image whose pixel size is the first pixel size, and uses the cropped image as the at least one partial image and the second image. 3. The video classification system according to claim 1 , wherein the processor performs image cropping at a plurality of corresponding positions of the sampled image to obtain a plurality of cropped images whose pixel sizes are the first pixel size, and uses the plurality of cropped images as the at least one partial image, and the processor then averages the plurality of cropped images as the second image. 4. The video classification system according to claim 1 , wherein the processor uses the convolutional neural network module to encode the first image of the sampled image into a first partial feature vector, the processor uses the convolutional neural network module to encode the second image of the sampled image into a second partial feature vector, and the processor then merges the first partial feature vector with the second partial feature vector to obtain the feature vector corresponding to the present time point. 5. The video classification system according to claim 1 , wherein the recurrent neural network module is a long short-term memory network. 6. A video classification method, applicable to a video classification system and performed by a processor, wherein the video classification system comprises: the processor, configured to obtain a video; a convolutional neural network module, having a plurality of trained first parameters; and a recurrent neural network module, having a plurality of trained second parameters, wherein the video classification method comprises the following steps: (a) selecting a present time point according to a time interval, and sampling the video according to the present time point to obtain a sampled image at the present time point; (b) adjusting a pixel size of the sampled image at the present time point to obtain a corresponding first image, wherein a pixel size of the first image is a first pixel size, and the first pixel size of the first image is smaller than the pixel size of the sampled image; (c) performing image cropping on the sampled image at the present time point to obtain at least one corresponding partial image, and obtaining a corresponding second image based on the at least one partial image, wherein a pixel size of the second image is the first pixel size; (d) using the convolutional neural network module to encode the first image and the second image of the sampled image at the present time point into a feature vector corresponding to the sampled image at the present time point; (e) sequentially merging the feature vector corresponding to the present time point with a plurality of past feature vectors corresponding to a plurality of past time points into a feature matrix; and (f) obtaining a classification of the video based on the recurrent neural network module and the feature matrix. 7. The video classification method according to claim 6 , wherein step (d) further comprises: performing image cropping at a corresponding position of the sampled image to obtain a cropped image whose pixel size is the first pixel size, and using the cropped image as the at least one partial image and the second image. 8. The video classification method according to claim 6 , wherein step (d) further comprises: performing image cropping at a plurality of corresponding positions of the sampled image to obtain a plurality of cropped images whose pixel sizes are the first pixel size, using the plurality of cropped images as the at least one partial image, and then averaging the plurality of cropped images as the second image. 9. The video classification method according to claim 6 , wherein step (d) further comprises: using the convolutional neural network module to encode the first image of the sampled image into a first partial feature vector, and using the convolutional neural network module to encode the second image of the sampled image into a second partial feature vector, and then merging the first partial feature vector with the second partial feature vector to obtain the feature vector corresponding to the present time point. 10. The video classification method according to claim 6 , wherein the recurrent neural network module is a long short-term memory network. 11. A neural network training system, comprising: a processor, configured to obtain a plurality of videos, a plurality of training images, and a classification corresponding to each of the plurality of videos and each of the plurality of training images; a convolutional neural network module, having a plurality of first parameters; a recurrent neural network module, having a plurality of second parameters; and a classification module, having a plurality of third parameters, wherein the processor is configured to perform the following steps: (a) obtaining a plurality of first sampled images from the videos and the training images; (b) selecting an unselected image from the plurality of first sampled images as a current image; (c) adjusting a pixel size of the current image to obtain a corresponding first image, wherein a pixel size of the first image is a first pixel size, and the first pixel size of the first image is smaller than the pixel size of the current image; (d) performing image cropping on the current image to obtain at least one corresponding first partial image, and obtaining a corresponding second image based on the at least one first partial image, wherein a pixel size of the second image is the first pixel size; (e) setting the classification corresponding to the first image, the second image, and the current image as a first training sample corresponding to the current image; (f) repeating steps (b), (c), (d), and (e) until the plurality of first sampled images are all selected; (g) using all the first training samples corresponding to each of the plural

Assignees

Inventors

Classifications

  • Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title

  • Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes · CPC title

  • Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • structured as a network, e.g. client-server architectures · CPC title

  • using neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12322176B2 cover?
A video classification system and method and a neural network training system are provided. The video classification system captures multiple sampled images, encodes first and second images of all sampled images into feature matrices of the sampled images using a convolutional neural network module, and obtains the classification of the videos based on a recurrent neural network module and the …
Who is the assignee on this patent?
Realtek Semiconductor Corp
What technology area does this patent fall under?
Primary CPC classification G06V20/41. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 03 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).