Automated detection and approximation of objects in video
US-2021216780-A1 · Jul 15, 2021 · US
US11430265B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11430265-B2 |
| Application number | US-202017022219-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 16, 2020 |
| Priority date | Jan 10, 2020 |
| Publication date | Aug 30, 2022 |
| Grant date | Aug 30, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present application discloses a video-based human behavior recognition method, apparatus, device and storage medium, and relates to the technical field of human recognitions. The specific implementation scheme lies in: acquiring a human rectangle of each video frame of the video to be recognized, where each human rectangle includes a plurality of human key points, and each of the human key points has a key point feature; constructing a feature matrix according to the human rectangle of the each video frame; convolving the feature matrix with respect to a video frame quantity dimension to obtain a first convolution result and convolving the feature matrix with respect to a key point quantity dimension to obtain a second convolution result; inputting the first convolution result and the second convolution result into a preset classification model to obtain a human behavior category of the video to be recognized.
Opening claim text (preview).
What is claimed is: 1. A video-based human behavior recognition method, comprising: acquiring a video to be recognized, wherein the video to be recognized comprises multiple video frames; acquiring a human rectangle for each video frame of the video to be recognized, wherein each human rectangle comprises a plurality of human key points, and each of the human key points has a key point feature; constructing a feature matrix according to the human rectangle of the each video frame, wherein the feature matrix comprises the key point feature of each of the human key points, a video frame quantity of the video frames in the video to be recognized, a key point quantity of the human key points in each human rectangle, and a human rectangle quantity of human rectangles in each video frame; convolving the feature matrix with respect to a video frame quantity dimension to obtain a first convolution result and convolving the feature matrix with respect to a key point quantity dimension to obtain a second convolution result; and inputting the first convolution result and the second convolution result into a preset classification model to obtain a human behavior category of the video to be recognized. 2. The video-based human behavior recognition method according to claim 1 , wherein the acquiring a video to be recognized comprises: acquiring a video to be processed, and performing a frame extraction process on the video to be processed to obtain the multiple video frames so as to obtain the video to be recognized. 3. The video-based human behavior recognition method according to claim 1 , wherein the acquiring a human rectangle of each video frame of the video to be recognized comprises: inputting the video to be recognized into a single shot multiBox detector network model to obtain each human rectangle in each video frame; inputting the each human rectangle in the each video frame into a preset recognition model to obtain the human key points in the each human rectangle, wherein the key point quantity of the human key points in the each human rectangle is V, and V=21. 4. The video-based human behavior recognition method according to claim 3 , further comprising: obtaining all human key points in a human rectangle by prediction according to the human key points in the human rectangle when it is determined that the human object in the human rectangle is obstructed or the key point quantity of the human key points in the human rectangle is not V. 5. The video-based human behavior recognition method according to claim 4 , wherein the obtaining all human key points in a human rectangle by prediction according to the human key points in the human rectangle comprises: determining a human skeleton structure of the human object in the human rectangle according to the human key points in the human rectangle; determining all human key points of the human rectangle according to the human skeleton structure. 6. The video-based human behavior recognition method according to claim 1 , wherein the human rectangle quantity in each video frame is M, and M is a positive integer; wherein M human rectangles are top M human rectangles with the highest human rectangle confidences in each video frame; and the method further comprises: acquiring key point confidences of the human key points of each human rectangle in each video frame; performing a weighted summation of the key point confidences of the human key points in each human rectangle to obtain a human rectangle confidence of the each human rectangle. 7. The video-based human behavior recognition method according to claim 1 , wherein when a video quantity of the video to be recognized is N and N is a positive integer, the feature matrix further comprises the video quantity. 8. The video-based human behavior recognition method according to claim 1 , wherein after the obtaining the human behavior category of the video to be recognized, the method further comprises: when it is determined that the human behavior category is a preset category, issuing an alerting message, wherein the alerting message comprises one or more of the following: voice information, text information, lighting information, and box selection information. 9. The video-based human behavior recognition method according to claim 1 , wherein after the obtaining the human behavior category of the video to be recognized, the method further comprises: when it is determined that the human behavior category is a preset category, performing preset processing on the video to be recognized, wherein the preset processing comprises one or more of the following: repeated playback processing, deletion processing, and obstruction processing. 10. A non-transitory computer-readable storage medium, storing thereon computer instructions, wherein the computer instructions are used to cause the computer to execute the method according to claim 1 . 11. An electronic device, comprising: at least one processor; and a memory communicatively connected with the at least one processor, wherein: the memory stores thereon instructions executable by the at least one processor, and the instructions are executed by the at least one processor to cause the at least one processor to: acquire a video to be recognized, wherein the video to be recognized comprises multiple video frames; acquire a human rectangle for each video frame of the video to be recognized, wherein each human rectangle comprises a plurality of human key points, and each of the human key points has a key point feature; construct a feature matrix according to the human rectangle of the each video frame, wherein the feature matrix comprises the key point feature of each of the human key points, a video frame quantity of the video frames in the video to be recognized, a key point quantity of the human key points in each human rectangle, and a human rectangle quantity of human rectangles in each video frame; convolve the feature matrix with respect to a video frame quantity dimension to obtain a first convolution result and convolving the feature matrix with respect to a key point quantity dimension to obtain a second convolution result; and input the first convolution result and the second convolution result into a preset classification model to obtain a human behavior category of the video to be recognized. 12. The electronic device according to claim 11 , wherein the instructions cause the at least one processor to: acquire a video to be processed, and perform a frame extraction process on the video to be processed to obtain the multiple video frames so as to obtain the video to be recognized. 13. The electronic device according to claim 11 , wherein the instructions cause the at least one processor to: input the video to be recognized into a single shot multiBox detector network model to obtain each human rectangle in each video frame; input the each human rectangle in the each video frame into a preset recognition model to obtain the human key points in the each human rectangle, wherein the key point quantity of the human key points in the each human rectangle is V, and V=21. 14. The electronic device according to claim 13 , wherein the instructions cause the at least one processor to: obtain all human key points in a human rectangle by prediction according to the human key points in the human rectangle when it is determined that the human object in the human rectangle is obstructed or the key point quantity of the human key points in the human rectangle is not V. 15. The electronic device according to claim 14 , wherein the instructions cause the at least
Recognition of a predetermined image pattern or behaviour pattern indicating theft or intrusion · CPC title
Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components · CPC title
Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title
Combinations of networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.