Methods and systems for hand gesture-based control of a device
US-2022291755-A1 · Sep 15, 2022 · US
US12510972B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12510972-B2 |
| Application number | US-202218068118-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 19, 2022 |
| Priority date | Dec 19, 2022 |
| Publication date | Dec 30, 2025 |
| Grant date | Dec 30, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented system and method relate to gesture recognition. A machine learning model includes a first subnetwork, a second subnetwork, and a third subnetwork. The first subnetwork generates feature data based on sensor data, which includes a gesture. The feature data is divided into a set of patches. The second subnetwork selects a target patch of feature data from among the set of patches. The third subnetwork generates gesture data based on the target patch of feature data. The gesture data identifies the gesture of the sensor data. Command data is generated based on the gesture data. A device is controlled based on the command data.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method for gesture recognition comprising: receiving sensor data from a sensor, the sensor data including a gesture; generating, via a first subnetwork, feature data upon receiving the sensor data; dividing the feature data into a set of patches; selecting, via a second subnetwork, a target patch of feature data from among the set of patches; generating, via a third subnetwork, gesture data based on the target patch of feature data, the gesture data being indicative of the gesture of the sensor data; generating command data based on the gesture data; and controlling an actuator based on the command data, wherein, a machine learning model comprises the first subnetwork, the second subnetwork, and the third subnetwork; the first subnetwork includes a first set of artificial neural network layers, the first set of artificial neural network layers being convolutional neural network (CNN) layers; the second subnetwork includes a second set of artificial neural network layers, the second set of artificial neural network layers being CNN layers, recurrent neural network (RNN) layers, or transformer neural network; and the third subnetwork includes a third set of artificial neural network layers, the third set of artificial neural network layers including CNN layers, RNN layers, or a transformer. 2 . The computer-implemented method of claim 1 , wherein: the second subnetwork includes a binary classifier, the binary classifier classifying each patch of feature data as being indicative of (i) subject data or (ii) non-subject data; the target patch is selected upon being classified as the subject data; and the third subnetwork is not applied to other patches from the set of patches in which each of the other patches is classified as the non-subject data. 3 . The computer-implemented method of claim 2 , wherein the subject data refers to the feature data that includes a gesturing part of the gesturer. 4 . The computer-implemented method of claim 1 , wherein: the third subnetwork includes a classifier that classifies the target patch of feature data into a gesture class; and the third subnetwork generates the gesture data indicative of the gesture class that identifies the gesture of the sensor data. 5 . The computer-implemented method of claim 4 , further comprising: generating an embedding vector upon classifying the target patch of feature data, the embedding vector being indicative of the gesture class; generating a gesture prediction vector based on the embedding vector; and generating the gesture data based on the gesture prediction vector. 6 . The computer-implemented method of claim 1 , wherein the sensor data includes (a) digital image data, (b) digital video data, (c) digital image data and depth data, or (d) digital video data and depth data. 7 . A system for gesture recognition, the system comprising: a processor; and a non-transitory computer readable medium in data communication with the processor, the non-transitory computer readable medium having computer readable data including instructions stored thereon that when executed by the processor is configured to cause the processor to perform a method that comprises: receiving sensor data from a sensor, the sensor data including a gesture; generating, via a first subnetwork, feature data upon receiving the sensor data; dividing the feature data into a set of patches; selecting, via a second subnetwork, a target patch of feature data from among the set of patches; generating, via a third subnetwork, gesture data by classifying the feature data of the target patch, the gesture data being indicative of the gesture of the sensor data; generating command data based on the gesture data; and controlling an actuator based on the command data, wherein, a machine learning model comprises the first subnetwork, the second subnetwork, and the third subnetwork; the first subnetwork includes a first set of artificial neural network layers, the first set of artificial neural network layers being convolutional neural network (CNN) layers; the second subnetwork includes a second set of artificial neural network layers, the second set of artificial neural network layers being CNN layers, recurrent neural network (RNN) layers, or a transformer; and the third subnetwork includes a third set of artificial neural network layers, the third set of artificial neural network layers including CNN layers, RNN layers, or a transformer. 8 . The system of claim 7 , wherein: the second subnetwork includes a binary classifier, the binary classifier classifying each patch of feature data as being indicative of (i) subject data or (ii) non-subject data; the target patch is selected upon being classified as the subject data; and the third subnetwork is not applied to other patches from the set of patches in which each of the other patches is classified as the non-subject data. 9 . The system of claim 8 , wherein the subject data refers to the feature data that includes a gesturing part of the gesturer. 10 . The system of claim 7 , wherein: the third subnetwork includes a classifier that classifies the target patch of feature data into a gesture class; and the third subnetwork generates the gesture data indicative of the gesture class that identifies the gesture. 11 . The system of claim 10 , further comprising: generating an embedding vector upon classifying the target patch of feature data, the embedding vector being indicative of the gesture class; generating a gesture prediction vector based on the embedding vector; and generating the gesture data based on the gesture prediction vector. 12 . The system of claim 7 , wherein the sensor data includes (a) digital image data, (b) digital video data, (c) digital image data and depth data, or (d) digital video data and depth data. 13 . A non-transitory computer readable medium having computer readable data including instructions stored thereon that when executed by a processor is configured to cause the processor to perform a method that comprises: receiving sensor data from a sensor, the sensor data including a gesture; generating, via a first subnetwork, feature data upon receiving the sensor data; dividing the feature data into a set of patches; selecting, via a second subnetwork, a target patch of feature data from among the set of patches; generating, via a third subnetwork, gesture data by classifying the feature data of the target patch, the gesture data being indicative of the gesture of the sensor data; generating command data based on the gesture data; and controlling an actuator based on the command data, wherein, a machine learning model comprises the first subnetwork, the second subnetwork, and the third subnetwork; the first subnetwork includes a first set of artificial neural network layers, the first set of artificial neural network layers being convolutional neural network (CNN) layers; the second subnetwork includes a second set of artificial neural network layers, the second set of artificial neural network layers being CNN layers, recurrent neural network (RNN) layers, or a transformer neural network; and the third subnetwork includes a third set of artificial neural network layers, the third set of artificial neural network layers including CNN layers, RNN layers, or a transformer neural network. 14 . The non-transitory computer readable medium of claim 13 , wherein: the second subnetwork includes a binary classifier, the binary classifier classifying each patch of feature data as being indicativ
using classification, e.g. of video objects · CPC title
using neural networks · CPC title
Recognition of hand or arm movements, e.g. recognition of deaf sign language (static hand signs G06V40/113) · CPC title
Gesture based interaction, e.g. based on a set of recognized hand gestures (interaction based on gestures traced on a digitiser G06F3/04883) · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.