Devices and methods for single or multi-user gesture detection using computer vision

US12424029B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12424029-B2
Application numberUS-202217846770-A
CountryUS
Kind codeB2
Filing dateJun 22, 2022
Priority dateJun 22, 2022
Publication dateSep 23, 2025
Grant dateSep 23, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and devices are described for computer vision-based gesture detection. From a frame of image data, extracted locations of keypoints of a detected hand are obtained. The extracted locations are normalized to obtain normalized features. The normalized features are processed using a trained decision tree ensemble to generate a probability of a valid gesture for the detected hand. The generated probability is compared with a defined decision threshold to generate a binary classification to classify the detected hand as a valid gesture or invalid gesture.

First claim

Opening claim text (preview).

The invention claimed is: 1. A device comprising: at least one processing unit coupled to a memory storing computer-executable instructions thereon, wherein the instructions, when executed by the at least one processing unit, cause the device to: display a frame of image data; obtain extracted locations of keypoints of a detected hand in the frame of image data; normalize the extracted locations to obtain normalized features; process the normalized features using a trained decision tree ensemble that comprises a plurality of trained decision trees, wherein each trained decision tree in the trained decision tree ensemble processes the normalized features and outputs a respective predicted output, and the predicted outputs outputted by the plurality of trained decision trees are combined to generate the probability of the valid gesture for the detect hand; compare the generated probability with a defined decision threshold to generate a binary classification to classify the detected hand as a valid gesture or invalid gesture; and after classifying the detected hand as the valid gesture, process the valid gesture as a selection input to select content in the frame of image data. 2. The device of claim 1 , wherein the extracted locations are extracted locations of five fingers of the detected hand. 3. The device of claim 1 , wherein the instructions cause the device to normalize the extracted locations by: fitting the extracted locations to an ellipse; determining a transformation to transform the ellipse to a unit circle; and applying the determined transformation to the extracted locations, to obtain the normalized features. 4. The device of claim 1 , wherein each trained decision tree is trained to process a set of normalized features as input and generate as output a predicted binary classification score, and wherein the probability of the valid gesture for the detected hand is an average of the binary classification scores generated by the plurality of trained decision trees. 5. The device of claim 1 , wherein the instructions cause the device to perform the obtaining, normalizing, processing and comparing to classify two or more detected hands as each performing a valid gesture that is a pointing gesture, and wherein the instructions further cause the device to: for a pair of detected hands assigned to a fingertip pair, obtain a detected location of a fingertip of each detected hand and a detected location of a wrist of at least one detected hand; compute a respective at least one hand direction for the at least one detected hand using a vector from the detected location of the wrist to the detected location of the fingertip; determine a user-specific orientation based on the at least one computed hand direction; and output an oriented selection region defined in the frame of image data, wherein the oriented selection region is defined using the user-specific orientation and the detected locations of the fingertips of the pair of detected hands. 6. The device of claim 5 , wherein a first hand direction is computed for a first hand in the pair of detected hands and a second hand direction is computed for a second hand in the pair of detected hands, and wherein the user-specific orientation is determined based on an average of the first and the second hand directions. 7. The device of claim 5 , wherein the oriented selection region is defined as a selection rectangle that is aligned with the user-specific orientation and that has opposite corners defined by the detected locations of the fingertips of the pair of detected hands. 8. The device of claim 1 , wherein the instructions further cause the device to: in response to generation of the binary classification to classify the detected hand as a valid gesture that is a pointing gesture, further classify whether the valid gesture is in a touching state or a hovering state by: synchronizing the frame of image data with a frame of depth data; extracting a patch of depth data in a region about a detected fingertip of the pointing gesture; computing a spread in depth values in the extracted patch of depth data; and comparing the computed spread with defined depth threshold to generate a touch state classification classifying the valid gesture as the touching state or the hovering state. 9. The device of claim 8 , wherein the frame of image data is synchronized with the frame of depth data using a circular buffer, wherein the circular buffer has a length equal to a known frame offset between received image data and received depth data. 10. The device of claim 1 , wherein there is a plurality of detected hands in the frame of image data, and wherein the obtaining, normalizing, processing and comparing are performed to classify two or more of the detected hands as performing a valid gesture that is a pointing gesture, and wherein the instructions further cause the device to: pair up at least two of the two or more detected hands as a fingertip pair; and define, using detected locations of fingertips of the fingertip pair, a selection region in the frame of image data. 11. The device of claim 10 , wherein there are at least four detected hands performing a valid gesture that is a pointing gesture, wherein there are at least two fingertip pairs, and wherein a respective selection region is defined for each of the at least two fingertip pairs. 12. The device of claim 1 , wherein the instructions further cause the device to: in response to generation of the binary classification to classify the detected hand as a valid gesture, define a selection region in the frame of image data based on the valid gesture; and perform text recognition on the defined selection region in the frame of image data. 13. A method comprising: displaying a frame of image data; obtaining extracted locations of keypoints of a detected hand in the frame of image data; normalizing the extracted locations to obtain normalized features; processing the normalized features using a trained decision tree ensemble that comprises a plurality of trained decision trees, wherein each trained decision tree in the trained decision tree ensemble processes the normalized features and outputs a respective predicted output, and the predicted outputs outputted by the plurality of trained decision trees are combined to generate the probability of the valid gesture for the detect hand; comparing the generated probability with a defined decision threshold to generate a binary classification to classify the detected hand as a valid gesture or invalid gesture; and after classifying the detected hand as the valid gesture, processing the valid gesture as a selection input to select content in the frame of image data. 14. The method of claim 13 , wherein normalizing the extracted locations comprises: fitting the extracted locations to an ellipse; determining a transformation to transform the ellipse to a unit circle; and applying the determined transformation to the extracted locations, to obtain the normalized features. 15. The method of claim 13 , wherein each trained decision tree is trained to process a set of normalized features as input and generate as output a predicted binary classification score, and wherein the probability of the valid gesture for the detected hand is an average of the binary classification scores generated by the plurality of trained decision trees. 16. The method of claim 13 , further comprising performing the obtaining, normalizing, processing and comparing to classify two or more detected hands as each performing a valid gesture that is a

Assignees

Inventors

Classifications

  • based on user interactions · CPC title

  • using feature-based methods · CPC title

  • using feature-based methods, e.g. the tracking of corners or segments · CPC title

  • Human being; Person · CPC title

  • Three-dimensional [3D] objects · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12424029B2 cover?
Methods and devices are described for computer vision-based gesture detection. From a frame of image data, extracted locations of keypoints of a detected hand are obtained. The extracted locations are normalized to obtain normalized features. The normalized features are processed using a trained decision tree ensemble to generate a probability of a valid gesture for the detected hand. The gener…
Who is the assignee on this patent?
Verdie Yannick, Yang Zi Hao, Sridhar Deepak, and 2 more
What technology area does this patent fall under?
Primary CPC classification G06V40/28. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 23 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).