Who is the assignee on this patent?

Beijing Baidu Netcom Sci & Tech Co Ltd

What technology area does this patent fall under?

Primary CPC classification G08B13/19613. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 30 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Video-based human behavior recognition method, apparatus, device and storage medium

US11430265B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11430265-B2
Application number	US-202017022219-A
Country	US
Kind code	B2
Filing date	Sep 16, 2020
Priority date	Jan 10, 2020
Publication date	Aug 30, 2022
Grant date	Aug 30, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present application discloses a video-based human behavior recognition method, apparatus, device and storage medium, and relates to the technical field of human recognitions. The specific implementation scheme lies in: acquiring a human rectangle of each video frame of the video to be recognized, where each human rectangle includes a plurality of human key points, and each of the human key points has a key point feature; constructing a feature matrix according to the human rectangle of the each video frame; convolving the feature matrix with respect to a video frame quantity dimension to obtain a first convolution result and convolving the feature matrix with respect to a key point quantity dimension to obtain a second convolution result; inputting the first convolution result and the second convolution result into a preset classification model to obtain a human behavior category of the video to be recognized.

First claim

Opening claim text (preview).

What is claimed is: 1. A video-based human behavior recognition method, comprising: acquiring a video to be recognized, wherein the video to be recognized comprises multiple video frames; acquiring a human rectangle for each video frame of the video to be recognized, wherein each human rectangle comprises a plurality of human key points, and each of the human key points has a key point feature; constructing a feature matrix according to the human rectangle of the each video frame, wherein the feature matrix comprises the key point feature of each of the human key points, a video frame quantity of the video frames in the video to be recognized, a key point quantity of the human key points in each human rectangle, and a human rectangle quantity of human rectangles in each video frame; convolving the feature matrix with respect to a video frame quantity dimension to obtain a first convolution result and convolving the feature matrix with respect to a key point quantity dimension to obtain a second convolution result; and inputting the first convolution result and the second convolution result into a preset classification model to obtain a human behavior category of the video to be recognized. 2. The video-based human behavior recognition method according to claim 1 , wherein the acquiring a video to be recognized comprises: acquiring a video to be processed, and performing a frame extraction process on the video to be processed to obtain the multiple video frames so as to obtain the video to be recognized. 3. The video-based human behavior recognition method according to claim 1 , wherein the acquiring a human rectangle of each video frame of the video to be recognized comprises: inputting the video to be recognized into a single shot multiBox detector network model to obtain each human rectangle in each video frame; inputting the each human rectangle in the each video frame into a preset recognition model to obtain the human key points in the each human rectangle, wherein the key point quantity of the human key points in the each human rectangle is V, and V=21. 4. The video-based human behavior recognition method according to claim 3 , further comprising: obtaining all human key points in a human rectangle by prediction according to the human key points in the human rectangle when it is determined that the human object in the human rectangle is obstructed or the key point quantity of the human key points in the human rectangle is not V. 5. The video-based human behavior recognition method according to claim 4 , wherein the obtaining all human key points in a human rectangle by prediction according to the human key points in the human rectangle comprises: determining a human skeleton structure of the human object in the human rectangle according to the human key points in the human rectangle; determining all human key points of the human rectangle according to the human skeleton structure. 6. The video-based human behavior recognition method according to claim 1 , wherein the human rectangle quantity in each video frame is M, and M is a positive integer; wherein M human rectangles are top M human rectangles with the highest human rectangle confidences in each video frame; and the method further comprises: acquiring key point confidences of the human key points of each human rectangle in each video frame; performing a weighted summation of the key point confidences of the human key points in each human rectangle to obtain a human rectangle confidence of the each human rectangle. 7. The video-based human behavior recognition method according to claim 1 , wherein when a video quantity of the video to be recognized is N and N is a positive integer, the feature matrix further comprises the video quantity. 8. The video-based human behavior recognition method according to claim 1 , wherein after the obtaining the human behavior category of the video to be recognized, the method further comprises: when it is determined that the human behavior category is a preset category, issuing an alerting message, wherein the alerting message comprises one or more of the following: voice information, text information, lighting information, and box selection information. 9. The video-based human behavior recognition method according to claim 1 , wherein after the obtaining the human behavior category of the video to be recognized, the method further comprises: when it is determined that the human behavior category is a preset category, performing preset processing on the video to be recognized, wherein the preset processing comprises one or more of the following: repeated playback processing, deletion processing, and obstruction processing. 10. A non-transitory computer-readable storage medium, storing thereon computer instructions, wherein the computer instructions are used to cause the computer to execute the method according to claim 1 . 11. An electronic device, comprising: at least one processor; and a memory communicatively connected with the at least one processor, wherein: the memory stores thereon instructions executable by the at least one processor, and the instructions are executed by the at least one processor to cause the at least one processor to: acquire a video to be recognized, wherein the video to be recognized comprises multiple video frames; acquire a human rectangle for each video frame of the video to be recognized, wherein each human rectangle comprises a plurality of human key points, and each of the human key points has a key point feature; construct a feature matrix according to the human rectangle of the each video frame, wherein the feature matrix comprises the key point feature of each of the human key points, a video frame quantity of the video frames in the video to be recognized, a key point quantity of the human key points in each human rectangle, and a human rectangle quantity of human rectangles in each video frame; convolve the feature matrix with respect to a video frame quantity dimension to obtain a first convolution result and convolving the feature matrix with respect to a key point quantity dimension to obtain a second convolution result; and input the first convolution result and the second convolution result into a preset classification model to obtain a human behavior category of the video to be recognized. 12. The electronic device according to claim 11 , wherein the instructions cause the at least one processor to: acquire a video to be processed, and perform a frame extraction process on the video to be processed to obtain the multiple video frames so as to obtain the video to be recognized. 13. The electronic device according to claim 11 , wherein the instructions cause the at least one processor to: input the video to be recognized into a single shot multiBox detector network model to obtain each human rectangle in each video frame; input the each human rectangle in the each video frame into a preset recognition model to obtain the human key points in the each human rectangle, wherein the key point quantity of the human key points in the each human rectangle is V, and V=21. 14. The electronic device according to claim 13 , wherein the instructions cause the at least one processor to: obtain all human key points in a human rectangle by prediction according to the human key points in the human rectangle when it is determined that the human object in the human rectangle is obstructed or the key point quantity of the human key points in the human rectangle is not V. 15. The electronic device according to claim 14 , wherein the instructions cause the at least

Assignees

Beijing Baidu Netcom Sci & Tech Co Ltd

Inventors

Classifications

G08B13/19613Primary
Recognition of a predetermined image pattern or behaviour pattern indicating theft or intrusion · CPC title
G06V10/44
Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components · CPC title
G06V40/20Primary
Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title
G06N3/045
Combinations of networks · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

View patent family 70948659

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11430265B2 cover?: The present application discloses a video-based human behavior recognition method, apparatus, device and storage medium, and relates to the technical field of human recognitions. The specific implementation scheme lies in: acquiring a human rectangle of each video frame of the video to be recognized, where each human rectangle includes a plurality of human key points, and each of the human key …
Who is the assignee on this patent?: Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?: Primary CPC classification G08B13/19613. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 30 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Automated detection and approximation of objects in video

Fakecatcher: detection of synthetic portrait videos using biological signals

Method and system for recognizing user actions with respect to objects

System and method for determining the characteristics of human personality and providing real-time recommendations

Frequently asked questions