Methods and systems for training convolutional neural networks
US-2022138573-A1 · May 5, 2022 · US
US12322176B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12322176-B2 |
| Application number | US-202217734384-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 2, 2022 |
| Priority date | Nov 17, 2021 |
| Publication date | Jun 3, 2025 |
| Grant date | Jun 3, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A video classification system and method and a neural network training system are provided. The video classification system captures multiple sampled images, encodes first and second images of all sampled images into feature matrices of the sampled images using a convolutional neural network module, and obtains the classification of the videos based on a recurrent neural network module and the feature matrices. The neural network training system captures multiple first sampled images and uses first training samples of all the first sampled images to train the convolutional neural network module and a classification module to obtain multiple parameter values of the convolutional neural network module. The neural network training system trains the recurrent neural network module based on the parameter values and multiple second sampled images.
Opening claim text (preview).
What is claimed is: 1. A video classification system, comprising: a processor, configured to obtain a video; a convolutional neural network module, having a plurality of trained first parameters; and a recurrent neural network module, having a plurality of trained second parameters, wherein the processor is configured to perform the following steps: (a) selecting a present time point according to a time interval, and sampling the video according to the present time point to obtain a sampled image at the present time point; (b) adjusting a pixel size of the sampled image at the present time point to obtain a corresponding first image, wherein a pixel size of the first image is a first pixel size, and the first pixel size of the first image is smaller than the pixel size of the sampled image; (c) performing image cropping on the sampled image at the present time point to obtain at least one corresponding partial image, and obtaining a corresponding second image based on the at least one partial image, wherein a pixel size of the second image is the first pixel size; (d) using the convolutional neural network module to encode the first image and the second image of the sampled image at the present time point into a feature vector corresponding to the sampled image at the present time point; (e) sequentially merging the feature vector corresponding to the present time point with a plurality of past feature vectors corresponding to a plurality of past time points into a feature matrix; and (f) obtaining a classification of the video based on the recurrent neural network module and the feature matrix. 2. The video classification system according to claim 1 , wherein the processor performs image cropping at a corresponding position of the sampled image to obtain a cropped image whose pixel size is the first pixel size, and uses the cropped image as the at least one partial image and the second image. 3. The video classification system according to claim 1 , wherein the processor performs image cropping at a plurality of corresponding positions of the sampled image to obtain a plurality of cropped images whose pixel sizes are the first pixel size, and uses the plurality of cropped images as the at least one partial image, and the processor then averages the plurality of cropped images as the second image. 4. The video classification system according to claim 1 , wherein the processor uses the convolutional neural network module to encode the first image of the sampled image into a first partial feature vector, the processor uses the convolutional neural network module to encode the second image of the sampled image into a second partial feature vector, and the processor then merges the first partial feature vector with the second partial feature vector to obtain the feature vector corresponding to the present time point. 5. The video classification system according to claim 1 , wherein the recurrent neural network module is a long short-term memory network. 6. A video classification method, applicable to a video classification system and performed by a processor, wherein the video classification system comprises: the processor, configured to obtain a video; a convolutional neural network module, having a plurality of trained first parameters; and a recurrent neural network module, having a plurality of trained second parameters, wherein the video classification method comprises the following steps: (a) selecting a present time point according to a time interval, and sampling the video according to the present time point to obtain a sampled image at the present time point; (b) adjusting a pixel size of the sampled image at the present time point to obtain a corresponding first image, wherein a pixel size of the first image is a first pixel size, and the first pixel size of the first image is smaller than the pixel size of the sampled image; (c) performing image cropping on the sampled image at the present time point to obtain at least one corresponding partial image, and obtaining a corresponding second image based on the at least one partial image, wherein a pixel size of the second image is the first pixel size; (d) using the convolutional neural network module to encode the first image and the second image of the sampled image at the present time point into a feature vector corresponding to the sampled image at the present time point; (e) sequentially merging the feature vector corresponding to the present time point with a plurality of past feature vectors corresponding to a plurality of past time points into a feature matrix; and (f) obtaining a classification of the video based on the recurrent neural network module and the feature matrix. 7. The video classification method according to claim 6 , wherein step (d) further comprises: performing image cropping at a corresponding position of the sampled image to obtain a cropped image whose pixel size is the first pixel size, and using the cropped image as the at least one partial image and the second image. 8. The video classification method according to claim 6 , wherein step (d) further comprises: performing image cropping at a plurality of corresponding positions of the sampled image to obtain a plurality of cropped images whose pixel sizes are the first pixel size, using the plurality of cropped images as the at least one partial image, and then averaging the plurality of cropped images as the second image. 9. The video classification method according to claim 6 , wherein step (d) further comprises: using the convolutional neural network module to encode the first image of the sampled image into a first partial feature vector, and using the convolutional neural network module to encode the second image of the sampled image into a second partial feature vector, and then merging the first partial feature vector with the second partial feature vector to obtain the feature vector corresponding to the present time point. 10. The video classification method according to claim 6 , wherein the recurrent neural network module is a long short-term memory network. 11. A neural network training system, comprising: a processor, configured to obtain a plurality of videos, a plurality of training images, and a classification corresponding to each of the plurality of videos and each of the plurality of training images; a convolutional neural network module, having a plurality of first parameters; a recurrent neural network module, having a plurality of second parameters; and a classification module, having a plurality of third parameters, wherein the processor is configured to perform the following steps: (a) obtaining a plurality of first sampled images from the videos and the training images; (b) selecting an unselected image from the plurality of first sampled images as a current image; (c) adjusting a pixel size of the current image to obtain a corresponding first image, wherein a pixel size of the first image is a first pixel size, and the first pixel size of the first image is smaller than the pixel size of the current image; (d) performing image cropping on the current image to obtain at least one corresponding first partial image, and obtaining a corresponding second image based on the at least one first partial image, wherein a pixel size of the second image is the first pixel size; (e) setting the classification corresponding to the first image, the second image, and the current image as a first training sample corresponding to the current image; (f) repeating steps (b), (c), (d), and (e) until the plurality of first sampled images are all selected; (g) using all the first training samples corresponding to each of the plural
Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title
Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes · CPC title
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
structured as a network, e.g. client-server architectures · CPC title
using neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.