Video-based human behavior recognition method, apparatus, device and storage medium

US11430265B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11430265-B2
Application numberUS-202017022219-A
CountryUS
Kind codeB2
Filing dateSep 16, 2020
Priority dateJan 10, 2020
Publication dateAug 30, 2022
Grant dateAug 30, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present application discloses a video-based human behavior recognition method, apparatus, device and storage medium, and relates to the technical field of human recognitions. The specific implementation scheme lies in: acquiring a human rectangle of each video frame of the video to be recognized, where each human rectangle includes a plurality of human key points, and each of the human key points has a key point feature; constructing a feature matrix according to the human rectangle of the each video frame; convolving the feature matrix with respect to a video frame quantity dimension to obtain a first convolution result and convolving the feature matrix with respect to a key point quantity dimension to obtain a second convolution result; inputting the first convolution result and the second convolution result into a preset classification model to obtain a human behavior category of the video to be recognized.

First claim

Opening claim text (preview).

What is claimed is: 1. A video-based human behavior recognition method, comprising: acquiring a video to be recognized, wherein the video to be recognized comprises multiple video frames; acquiring a human rectangle for each video frame of the video to be recognized, wherein each human rectangle comprises a plurality of human key points, and each of the human key points has a key point feature; constructing a feature matrix according to the human rectangle of the each video frame, wherein the feature matrix comprises the key point feature of each of the human key points, a video frame quantity of the video frames in the video to be recognized, a key point quantity of the human key points in each human rectangle, and a human rectangle quantity of human rectangles in each video frame; convolving the feature matrix with respect to a video frame quantity dimension to obtain a first convolution result and convolving the feature matrix with respect to a key point quantity dimension to obtain a second convolution result; and inputting the first convolution result and the second convolution result into a preset classification model to obtain a human behavior category of the video to be recognized. 2. The video-based human behavior recognition method according to claim 1 , wherein the acquiring a video to be recognized comprises: acquiring a video to be processed, and performing a frame extraction process on the video to be processed to obtain the multiple video frames so as to obtain the video to be recognized. 3. The video-based human behavior recognition method according to claim 1 , wherein the acquiring a human rectangle of each video frame of the video to be recognized comprises: inputting the video to be recognized into a single shot multiBox detector network model to obtain each human rectangle in each video frame; inputting the each human rectangle in the each video frame into a preset recognition model to obtain the human key points in the each human rectangle, wherein the key point quantity of the human key points in the each human rectangle is V, and V=21. 4. The video-based human behavior recognition method according to claim 3 , further comprising: obtaining all human key points in a human rectangle by prediction according to the human key points in the human rectangle when it is determined that the human object in the human rectangle is obstructed or the key point quantity of the human key points in the human rectangle is not V. 5. The video-based human behavior recognition method according to claim 4 , wherein the obtaining all human key points in a human rectangle by prediction according to the human key points in the human rectangle comprises: determining a human skeleton structure of the human object in the human rectangle according to the human key points in the human rectangle; determining all human key points of the human rectangle according to the human skeleton structure. 6. The video-based human behavior recognition method according to claim 1 , wherein the human rectangle quantity in each video frame is M, and M is a positive integer; wherein M human rectangles are top M human rectangles with the highest human rectangle confidences in each video frame; and the method further comprises: acquiring key point confidences of the human key points of each human rectangle in each video frame; performing a weighted summation of the key point confidences of the human key points in each human rectangle to obtain a human rectangle confidence of the each human rectangle. 7. The video-based human behavior recognition method according to claim 1 , wherein when a video quantity of the video to be recognized is N and N is a positive integer, the feature matrix further comprises the video quantity. 8. The video-based human behavior recognition method according to claim 1 , wherein after the obtaining the human behavior category of the video to be recognized, the method further comprises: when it is determined that the human behavior category is a preset category, issuing an alerting message, wherein the alerting message comprises one or more of the following: voice information, text information, lighting information, and box selection information. 9. The video-based human behavior recognition method according to claim 1 , wherein after the obtaining the human behavior category of the video to be recognized, the method further comprises: when it is determined that the human behavior category is a preset category, performing preset processing on the video to be recognized, wherein the preset processing comprises one or more of the following: repeated playback processing, deletion processing, and obstruction processing. 10. A non-transitory computer-readable storage medium, storing thereon computer instructions, wherein the computer instructions are used to cause the computer to execute the method according to claim 1 . 11. An electronic device, comprising: at least one processor; and a memory communicatively connected with the at least one processor, wherein: the memory stores thereon instructions executable by the at least one processor, and the instructions are executed by the at least one processor to cause the at least one processor to: acquire a video to be recognized, wherein the video to be recognized comprises multiple video frames; acquire a human rectangle for each video frame of the video to be recognized, wherein each human rectangle comprises a plurality of human key points, and each of the human key points has a key point feature; construct a feature matrix according to the human rectangle of the each video frame, wherein the feature matrix comprises the key point feature of each of the human key points, a video frame quantity of the video frames in the video to be recognized, a key point quantity of the human key points in each human rectangle, and a human rectangle quantity of human rectangles in each video frame; convolve the feature matrix with respect to a video frame quantity dimension to obtain a first convolution result and convolving the feature matrix with respect to a key point quantity dimension to obtain a second convolution result; and input the first convolution result and the second convolution result into a preset classification model to obtain a human behavior category of the video to be recognized. 12. The electronic device according to claim 11 , wherein the instructions cause the at least one processor to: acquire a video to be processed, and perform a frame extraction process on the video to be processed to obtain the multiple video frames so as to obtain the video to be recognized. 13. The electronic device according to claim 11 , wherein the instructions cause the at least one processor to: input the video to be recognized into a single shot multiBox detector network model to obtain each human rectangle in each video frame; input the each human rectangle in the each video frame into a preset recognition model to obtain the human key points in the each human rectangle, wherein the key point quantity of the human key points in the each human rectangle is V, and V=21. 14. The electronic device according to claim 13 , wherein the instructions cause the at least one processor to: obtain all human key points in a human rectangle by prediction according to the human key points in the human rectangle when it is determined that the human object in the human rectangle is obstructed or the key point quantity of the human key points in the human rectangle is not V. 15. The electronic device according to claim 14 , wherein the instructions cause the at least

Assignees

Inventors

Classifications

  • Recognition of a predetermined image pattern or behaviour pattern indicating theft or intrusion · CPC title

  • Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components · CPC title

  • G06V40/20Primary

    Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title

  • Combinations of networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11430265B2 cover?
The present application discloses a video-based human behavior recognition method, apparatus, device and storage medium, and relates to the technical field of human recognitions. The specific implementation scheme lies in: acquiring a human rectangle of each video frame of the video to be recognized, where each human rectangle includes a plurality of human key points, and each of the human key …
Who is the assignee on this patent?
Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G08B13/19613. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 30 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).