Image processing method, storage medium, and computer device
US-2020380031-A1 · Dec 3, 2020 · US
US11928893B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11928893-B2 |
| Application number | US-202117530428-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 18, 2021 |
| Priority date | Nov 20, 2019 |
| Publication date | Mar 12, 2024 |
| Grant date | Mar 12, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An action recognition method includes: obtaining original feature submaps of each of temporal frames on a plurality of convolutional channels by using a multi-channel convolutional layer; calculating, by using each of the temporal frames as a target temporal frame, motion information weights of the target temporal frame on the convolutional channels according to original feature submaps of the target temporal frame and original feature submaps of a next temporal frame, and obtaining motion information feature maps of the target temporal frame on the convolutional channels according to the motion information weights; performing temporal convolution on the motion information feature maps of the target temporal frame to obtain temporal motion feature maps of the target temporal frame; and recognizing an action type of a moving object in image data of the target temporal frame according to the temporal motion feature maps of the target temporal frame on the convolutional channels.
Opening claim text (preview).
What is claimed is: 1. An action recognition method, performed by a computer device, the method comprising: obtaining image data of video data in a plurality of different temporal frames; obtaining original feature submaps of each of the temporal frames on a plurality of different convolutional channels by using a multi-channel convolutional layer; calculating, by using each of the temporal frames as a target temporal frame, motion information weights of the target temporal frame on the convolutional channels according to the original feature submaps of the target temporal frame on the convolutional channels and the original feature submaps of a next temporal frame adjacent to the target temporal frame on each of the convolutional channels; obtaining motion information feature maps of the target temporal frame on the convolutional channels according to the motion information weights and the original feature submaps of the target temporal frame on the convolutional channels; performing temporal convolution on the motion information feature maps to obtain temporal motion feature maps of the target temporal frame on the convolutional channels; and recognizing an action type of a moving object in image data of the target temporal frame according to the temporal motion feature maps. 2. The method according to claim 1 , wherein the calculating motion information weights of the target temporal frame on the convolutional channels according to the original feature submaps of the target temporal frame on the convolutional channels and the original feature submaps of a next temporal frame adjacent to the target temporal frame on the convolutional channels comprises: obtaining difference information between the original feature submaps of the target temporal frame and the original feature submaps of the next temporal frame on the convolutional channels; and mapping the difference information on the convolutional channels into the motion information weights of the target temporal frame on the convolutional channels by using an activation function. 3. The method according to claim 2 , wherein the obtaining difference information between the original feature submaps of the target temporal frame and the original feature submaps of the next temporal frame on the convolutional channels comprises: respectively transforming, by using a unit pooling layer, the original feature submaps of the target temporal frame on the convolutional channels into unit feature submaps of the target temporal frame, and the original feature submaps of the next temporal frame on the convolutional channels into unit feature submaps of the temporal frame; respectively performing dimension reduction with a preset scaling factor on the unit feature submaps of the target temporal frame and the unit feature submaps of the next temporal frame to obtain dimension-reduced unit feature submaps of the target temporal frame and dimension-reduced unit feature submaps of the next temporal frame; obtaining dimension-reduced difference information between the dimension-reduced unit feature submaps of the target temporal frame and the dimension-reduced unit feature submaps of the next temporal frame; and performing dimension raising with the preset scaling factor on the dimension-reduced difference information to obtain the difference information between the original feature submaps of the target temporal frame and the original feature submaps of the next temporal frame on the convolutional channels. 4. The method according to claim 1 , wherein the performing temporal convolution on the motion information feature maps to obtain temporal motion feature maps of the target temporal frame on the convolutional channels comprises: separately obtaining motion information feature maps of a preceding temporal frame adjacent to the target temporal frame on the convolutional channels and motion information feature maps of the next temporal frame on the convolutional channels; and performing, for each of the convolutional channels, a convolution operation on a motion information feature map of the target temporal frame, a motion information feature map of the preceding temporal frame, and a motion information feature map of the next temporal frame on the same convolutional channel by using a temporal convolution kernel, to obtain the temporal motion feature maps of the target temporal frame on the convolutional channels. 5. The method according to claim 1 , wherein the recognizing an action type of a moving object in image data of the target temporal frame according to the temporal motion feature maps comprises: inputting the temporal motion feature maps of the target temporal frame into a residual network layer, to obtain action feature information of the image data of the target temporal frame; and inputting the action feature information into an action classification network layer, to recognize the action type of the moving object in the image data of the target temporal frame. 6. The method according to claim 5 , further comprising: using the action feature information as the original feature submaps of the image data of the target temporal frame on the convolutional channels; calculating again, the motion information weights of the target temporal frame on the convolutional channels according to the original feature submaps of the target temporal frame on the convolutional channels and the original feature submaps of the next temporal frame adjacent to the target temporal frame on the convolutional channels. 7. The method according to claim 1 , further comprising: determining, after action types of the moving object in the image data of the temporal frames are obtained, an action type corresponding to the video data according to the action types of the temporal frames. 8. The method according to claim 2 , further comprising: obtaining a training video sample, the training video sample comprising a plurality of different sample temporal frames and standard action types of a moving object in the sample temporal frames; performing a training process comprising: obtaining original feature submap samples of each of the sample temporal frames on the different convolutional channels by using the multi-channel convolutional layer; obtaining, by using each of the sample temporal frames as a target sample temporal frame, sample difference information between original feature submap samples of the target sample temporal frame and original feature submap samples of a next sample temporal frame on the convolutional channels; mapping the sample difference information on the convolutional channels into motion information weight samples of the target sample temporal frame on the convolutional channels by using the activation function; obtaining motion information feature map samples of the target sample temporal frame on the convolutional channels according to the motion information weight samples and the original feature submap samples of the target sample temporal frame on the convolutional channels; performing temporal convolution on the motion information feature map samples of the target sample temporal frame on the convolutional channels, to obtain temporal motion feature map samples of the target sample temporal frame on the convolutional channels; obtaining a predicted action type of the moving object in the target sample temporal frame according to the temporal motion feature map samples of the target sample temporal frame on the convolutional channels; and adjusting parameters of the multi-channel convolutional layer, the activation function, and a temporal convolution kernel according to a difference between the predicted action type and a standard action type of the target sample temporal frame; an
Supervised learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Training; Learning · CPC title
Learning methods · CPC title
based on behaviour analysis · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.