Action recognition method and apparatus, computer storage medium, and computer device

US11928893B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11928893-B2
Application numberUS-202117530428-A
CountryUS
Kind codeB2
Filing dateNov 18, 2021
Priority dateNov 20, 2019
Publication dateMar 12, 2024
Grant dateMar 12, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An action recognition method includes: obtaining original feature submaps of each of temporal frames on a plurality of convolutional channels by using a multi-channel convolutional layer; calculating, by using each of the temporal frames as a target temporal frame, motion information weights of the target temporal frame on the convolutional channels according to original feature submaps of the target temporal frame and original feature submaps of a next temporal frame, and obtaining motion information feature maps of the target temporal frame on the convolutional channels according to the motion information weights; performing temporal convolution on the motion information feature maps of the target temporal frame to obtain temporal motion feature maps of the target temporal frame; and recognizing an action type of a moving object in image data of the target temporal frame according to the temporal motion feature maps of the target temporal frame on the convolutional channels.

First claim

Opening claim text (preview).

What is claimed is: 1. An action recognition method, performed by a computer device, the method comprising: obtaining image data of video data in a plurality of different temporal frames; obtaining original feature submaps of each of the temporal frames on a plurality of different convolutional channels by using a multi-channel convolutional layer; calculating, by using each of the temporal frames as a target temporal frame, motion information weights of the target temporal frame on the convolutional channels according to the original feature submaps of the target temporal frame on the convolutional channels and the original feature submaps of a next temporal frame adjacent to the target temporal frame on each of the convolutional channels; obtaining motion information feature maps of the target temporal frame on the convolutional channels according to the motion information weights and the original feature submaps of the target temporal frame on the convolutional channels; performing temporal convolution on the motion information feature maps to obtain temporal motion feature maps of the target temporal frame on the convolutional channels; and recognizing an action type of a moving object in image data of the target temporal frame according to the temporal motion feature maps. 2. The method according to claim 1 , wherein the calculating motion information weights of the target temporal frame on the convolutional channels according to the original feature submaps of the target temporal frame on the convolutional channels and the original feature submaps of a next temporal frame adjacent to the target temporal frame on the convolutional channels comprises: obtaining difference information between the original feature submaps of the target temporal frame and the original feature submaps of the next temporal frame on the convolutional channels; and mapping the difference information on the convolutional channels into the motion information weights of the target temporal frame on the convolutional channels by using an activation function. 3. The method according to claim 2 , wherein the obtaining difference information between the original feature submaps of the target temporal frame and the original feature submaps of the next temporal frame on the convolutional channels comprises: respectively transforming, by using a unit pooling layer, the original feature submaps of the target temporal frame on the convolutional channels into unit feature submaps of the target temporal frame, and the original feature submaps of the next temporal frame on the convolutional channels into unit feature submaps of the temporal frame; respectively performing dimension reduction with a preset scaling factor on the unit feature submaps of the target temporal frame and the unit feature submaps of the next temporal frame to obtain dimension-reduced unit feature submaps of the target temporal frame and dimension-reduced unit feature submaps of the next temporal frame; obtaining dimension-reduced difference information between the dimension-reduced unit feature submaps of the target temporal frame and the dimension-reduced unit feature submaps of the next temporal frame; and performing dimension raising with the preset scaling factor on the dimension-reduced difference information to obtain the difference information between the original feature submaps of the target temporal frame and the original feature submaps of the next temporal frame on the convolutional channels. 4. The method according to claim 1 , wherein the performing temporal convolution on the motion information feature maps to obtain temporal motion feature maps of the target temporal frame on the convolutional channels comprises: separately obtaining motion information feature maps of a preceding temporal frame adjacent to the target temporal frame on the convolutional channels and motion information feature maps of the next temporal frame on the convolutional channels; and performing, for each of the convolutional channels, a convolution operation on a motion information feature map of the target temporal frame, a motion information feature map of the preceding temporal frame, and a motion information feature map of the next temporal frame on the same convolutional channel by using a temporal convolution kernel, to obtain the temporal motion feature maps of the target temporal frame on the convolutional channels. 5. The method according to claim 1 , wherein the recognizing an action type of a moving object in image data of the target temporal frame according to the temporal motion feature maps comprises: inputting the temporal motion feature maps of the target temporal frame into a residual network layer, to obtain action feature information of the image data of the target temporal frame; and inputting the action feature information into an action classification network layer, to recognize the action type of the moving object in the image data of the target temporal frame. 6. The method according to claim 5 , further comprising: using the action feature information as the original feature submaps of the image data of the target temporal frame on the convolutional channels; calculating again, the motion information weights of the target temporal frame on the convolutional channels according to the original feature submaps of the target temporal frame on the convolutional channels and the original feature submaps of the next temporal frame adjacent to the target temporal frame on the convolutional channels. 7. The method according to claim 1 , further comprising: determining, after action types of the moving object in the image data of the temporal frames are obtained, an action type corresponding to the video data according to the action types of the temporal frames. 8. The method according to claim 2 , further comprising: obtaining a training video sample, the training video sample comprising a plurality of different sample temporal frames and standard action types of a moving object in the sample temporal frames; performing a training process comprising: obtaining original feature submap samples of each of the sample temporal frames on the different convolutional channels by using the multi-channel convolutional layer; obtaining, by using each of the sample temporal frames as a target sample temporal frame, sample difference information between original feature submap samples of the target sample temporal frame and original feature submap samples of a next sample temporal frame on the convolutional channels; mapping the sample difference information on the convolutional channels into motion information weight samples of the target sample temporal frame on the convolutional channels by using the activation function; obtaining motion information feature map samples of the target sample temporal frame on the convolutional channels according to the motion information weight samples and the original feature submap samples of the target sample temporal frame on the convolutional channels; performing temporal convolution on the motion information feature map samples of the target sample temporal frame on the convolutional channels, to obtain temporal motion feature map samples of the target sample temporal frame on the convolutional channels; obtaining a predicted action type of the moving object in the target sample temporal frame according to the temporal motion feature map samples of the target sample temporal frame on the convolutional channels; and adjusting parameters of the multi-channel convolutional layer, the activation function, and a temporal convolution kernel according to a difference between the predicted action type and a standard action type of the target sample temporal frame; an

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11928893B2 cover?
An action recognition method includes: obtaining original feature submaps of each of temporal frames on a plurality of convolutional channels by using a multi-channel convolutional layer; calculating, by using each of the temporal frames as a target temporal frame, motion information weights of the target temporal frame on the convolutional channels according to original feature submaps of the …
Who is the assignee on this patent?
Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?
Primary CPC classification G08B21/0407. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 12 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).