Methods and apparatuses for recognizing dynamic gesture, and control methods and apparatuses using gesture interaction

US11221681B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11221681-B2
Application numberUS-201916530190-A
CountryUS
Kind codeB2
Filing dateAug 2, 2019
Priority dateDec 22, 2017
Publication dateJan 11, 2022
Grant dateJan 11, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for recognizing a dynamic gesture includes: positioning a dynamic gesture in a video stream to be detected to obtain a dynamic gesture box; capturing an image block corresponding to the dynamic gesture box from each of multiple image frames of the video stream; generating a detection sequence based on the captured image block; and performing dynamic gesture recognition according to the detection sequence.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for recognizing a dynamic gesture, comprising: positioning a dynamic gesture in a video stream to be detected to obtain a dynamic gesture box; capturing an image block corresponding to the dynamic gesture box from each of multiple image frames of the video stream, wherein respective parts of the multiple image frames, which are out of the dynamic gesture box, are removed; generating a detection sequence based on the captured image blocks, wherein the detection sequence is a sequence of images different from the multiple image frames of the video stream; and performing dynamic gesture recognition according to the detection sequence, wherein the performing dynamic gesture recognition according to the detection sequence comprises: determining multiple inter-frame image differences in the detection sequence, wherein each of the multiple inter-frame image differences is an image obtained by calculating a difference between pixels at each same position in two adjacent or non-adjacent image frames; generating an image difference sequence based on the multiple inter-frame image differences; and performing the dynamic gesture recognition according to the detection sequence and the image difference sequence, which comprises: inputting the detection sequence into a first dynamic gesture recognition model to obtain a first dynamic gesture category prediction probability output by the first dynamic gesture recognition model; inputting the image difference sequence into a second dynamic gesture recognition model to obtain a second dynamic gesture category prediction probability output by the second dynamic gesture recognition model; and determining a dynamic gesture recognition result according to the first dynamic gesture category prediction probability and the second dynamic gesture category prediction probability. 2. A control method using gesture interaction, comprising: obtaining a video stream; determining a dynamic gesture recognition result of the video stream by the method according to claim 1 ; and controlling a device to execute an operation corresponding to the dynamic gesture recognition result. 3. The method according to claim 2 , wherein the controlling a device to execute an operation corresponding to the dynamic gesture recognition result comprises: obtaining the operation instruction corresponding to the dynamic gesture recognition result according to a predetermined correspondence between the dynamic gesture recognition result and the operation instruction; and controlling the device to execute a corresponding operation according to the operation instruction; or wherein the controlling a device to execute an operation corresponding to the dynamic gesture recognition result comprises: in response to the dynamic gesture recognition result being a predefined dynamic action, controlling a vehicle to execute an operation corresponding to the predefined dynamic action. 4. The method according to claim 3 , wherein the controlling the device to execute a corresponding operation according to the operation instruction comprises: controlling a window, a door, or a vehicle-mounted system of a vehicle according to the operation instruction. 5. The method according to claim 3 , wherein the predefined dynamic action comprises a dynamic gesture comprising at least one of: single-finger clockwise/counterclockwise rotation, palm left/right swing, two-finger poke, extending the thumb and pinky finger, press-down with the palm downward, lift with the palm upward, fanning to the left/right with the palm, left/right movement with the thumb extended, long slide to the left/right with the palm, changing a fist into a palm with the palm upward, changing a palm into a fist with the palm upward, changing a palm into a fist with the palm downward, changing a fist into a palm with the palm downward, single-finger slide, pinch-in with multiple fingers, single-finger double click, single-finger single click, multi-finger double click, or multi-finger single click; and the operation corresponding to the predefined dynamic action comprises at least one of: volume up/down, song switching, song pause/resume, call answering or initiation, hang-up or call rejection, air conditioning temperature increase or decrease, multi-screen interaction, sunroof opening, sunroof closing, door lock locking, door lock unlocking, drag for navigation, map zoom-out, or map zoom-in. 6. An electronic device, comprising: a memory storing processor-executable instructions; and a processor, configured to execute the stored processor-executable instructions to perform operations of the control method using gesture interaction according to claim 3 . 7. The method according to claim 1 , wherein the positioning a dynamic gesture in a video stream to be detected to obtain a dynamic gesture box comprises: positioning a static gesture in at least one image frame of the multiple image frames of the video stream to obtain a static gesture box of the at least one image frame; and determining the dynamic gesture box according to the static gesture box of the at least one image frame. 8. The method according to claim 7 , wherein the determining the dynamic gesture box according to the static gesture box of the at least one image frame comprises: enlarging the static gesture box of the at least one image frame to obtain the dynamic gesture box. 9. The method according to claim 7 , wherein the static gesture box of the at least one image frame of the multiple image frames of the video stream meets the following condition: the static gesture box is located within the dynamic gesture box, or the static gesture box is as same as the dynamic gesture box. 10. The method according to claim 1 , wherein before the performing the dynamic gesture recognition according to the detection sequence and the image difference sequence, the method further comprises: establishing the first dynamic gesture recognition model by: collecting one or more sample video streams involving different categories of dynamic gestures; annotating dynamic gesture boxes of the different categories of dynamic gestures; capturing image blocks corresponding to annotation information of the dynamic gesture boxes from multiple image frames of the sample video stream to form an image sequence; and training the first dynamic gesture recognition model by using categories of the dynamic gestures as supervision data and using the image sequence as training data. 11. The method according to claim 10 , wherein the training the first dynamic gesture recognition model by using categories of the dynamic gestures as supervision data and using the image sequence as training data comprises: dividing the image sequence into at least one segment; extracting a preset number of image frames from the at least one segment, and stacking the image frames to form image training data; and training the first dynamic gesture recognition model by using the categories of the dynamic gestures as the supervision data and using the image training data. 12. The method according to claim 1 , wherein before the performing dynamic gesture recognition according to the detection sequence and the image difference sequence, the method further comprises: establishing the second dynamic gesture recognition model by the following means: collecting one or more sample video streams involving different categories of dynamic gestures; annotating dynamic gesture boxes of the different categories of dynamic gestures; capturing image blocks corresponding to annotation information of the dynamic gesture boxes from multiple image frames of the o

Assignees

Inventors

Classifications

  • using neural networks · CPC title

  • Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title

  • using classification, e.g. of video objects · CPC title

  • Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN] · CPC title

  • Recognition of hand or arm movements, e.g. recognition of deaf sign language (static hand signs G06V40/113) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11221681B2 cover?
A method for recognizing a dynamic gesture includes: positioning a dynamic gesture in a video stream to be detected to obtain a dynamic gesture box; capturing an image block corresponding to the dynamic gesture box from each of multiple image frames of the video stream; generating a detection sequence based on the captured image block; and performing dynamic gesture recognition according to the…
Who is the assignee on this patent?
Beijing Sensetime Tech Development Co Ltd, Shanghai Sensetime Intelligent Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F3/017. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 11 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).