System and method for long-distance recognition and personalization of gestures

US12510972B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12510972-B2
Application numberUS-202218068118-A
CountryUS
Kind codeB2
Filing dateDec 19, 2022
Priority dateDec 19, 2022
Publication dateDec 30, 2025
Grant dateDec 30, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented system and method relate to gesture recognition. A machine learning model includes a first subnetwork, a second subnetwork, and a third subnetwork. The first subnetwork generates feature data based on sensor data, which includes a gesture. The feature data is divided into a set of patches. The second subnetwork selects a target patch of feature data from among the set of patches. The third subnetwork generates gesture data based on the target patch of feature data. The gesture data identifies the gesture of the sensor data. Command data is generated based on the gesture data. A device is controlled based on the command data.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for gesture recognition comprising: receiving sensor data from a sensor, the sensor data including a gesture; generating, via a first subnetwork, feature data upon receiving the sensor data; dividing the feature data into a set of patches; selecting, via a second subnetwork, a target patch of feature data from among the set of patches; generating, via a third subnetwork, gesture data based on the target patch of feature data, the gesture data being indicative of the gesture of the sensor data; generating command data based on the gesture data; and controlling an actuator based on the command data, wherein, a machine learning model comprises the first subnetwork, the second subnetwork, and the third subnetwork; the first subnetwork includes a first set of artificial neural network layers, the first set of artificial neural network layers being convolutional neural network (CNN) layers; the second subnetwork includes a second set of artificial neural network layers, the second set of artificial neural network layers being CNN layers, recurrent neural network (RNN) layers, or transformer neural network; and the third subnetwork includes a third set of artificial neural network layers, the third set of artificial neural network layers including CNN layers, RNN layers, or a transformer. 2 . The computer-implemented method of claim 1 , wherein: the second subnetwork includes a binary classifier, the binary classifier classifying each patch of feature data as being indicative of (i) subject data or (ii) non-subject data; the target patch is selected upon being classified as the subject data; and the third subnetwork is not applied to other patches from the set of patches in which each of the other patches is classified as the non-subject data. 3 . The computer-implemented method of claim 2 , wherein the subject data refers to the feature data that includes a gesturing part of the gesturer. 4 . The computer-implemented method of claim 1 , wherein: the third subnetwork includes a classifier that classifies the target patch of feature data into a gesture class; and the third subnetwork generates the gesture data indicative of the gesture class that identifies the gesture of the sensor data. 5 . The computer-implemented method of claim 4 , further comprising: generating an embedding vector upon classifying the target patch of feature data, the embedding vector being indicative of the gesture class; generating a gesture prediction vector based on the embedding vector; and generating the gesture data based on the gesture prediction vector. 6 . The computer-implemented method of claim 1 , wherein the sensor data includes (a) digital image data, (b) digital video data, (c) digital image data and depth data, or (d) digital video data and depth data. 7 . A system for gesture recognition, the system comprising: a processor; and a non-transitory computer readable medium in data communication with the processor, the non-transitory computer readable medium having computer readable data including instructions stored thereon that when executed by the processor is configured to cause the processor to perform a method that comprises: receiving sensor data from a sensor, the sensor data including a gesture; generating, via a first subnetwork, feature data upon receiving the sensor data; dividing the feature data into a set of patches; selecting, via a second subnetwork, a target patch of feature data from among the set of patches; generating, via a third subnetwork, gesture data by classifying the feature data of the target patch, the gesture data being indicative of the gesture of the sensor data; generating command data based on the gesture data; and controlling an actuator based on the command data, wherein, a machine learning model comprises the first subnetwork, the second subnetwork, and the third subnetwork; the first subnetwork includes a first set of artificial neural network layers, the first set of artificial neural network layers being convolutional neural network (CNN) layers; the second subnetwork includes a second set of artificial neural network layers, the second set of artificial neural network layers being CNN layers, recurrent neural network (RNN) layers, or a transformer; and the third subnetwork includes a third set of artificial neural network layers, the third set of artificial neural network layers including CNN layers, RNN layers, or a transformer. 8 . The system of claim 7 , wherein: the second subnetwork includes a binary classifier, the binary classifier classifying each patch of feature data as being indicative of (i) subject data or (ii) non-subject data; the target patch is selected upon being classified as the subject data; and the third subnetwork is not applied to other patches from the set of patches in which each of the other patches is classified as the non-subject data. 9 . The system of claim 8 , wherein the subject data refers to the feature data that includes a gesturing part of the gesturer. 10 . The system of claim 7 , wherein: the third subnetwork includes a classifier that classifies the target patch of feature data into a gesture class; and the third subnetwork generates the gesture data indicative of the gesture class that identifies the gesture. 11 . The system of claim 10 , further comprising: generating an embedding vector upon classifying the target patch of feature data, the embedding vector being indicative of the gesture class; generating a gesture prediction vector based on the embedding vector; and generating the gesture data based on the gesture prediction vector. 12 . The system of claim 7 , wherein the sensor data includes (a) digital image data, (b) digital video data, (c) digital image data and depth data, or (d) digital video data and depth data. 13 . A non-transitory computer readable medium having computer readable data including instructions stored thereon that when executed by a processor is configured to cause the processor to perform a method that comprises: receiving sensor data from a sensor, the sensor data including a gesture; generating, via a first subnetwork, feature data upon receiving the sensor data; dividing the feature data into a set of patches; selecting, via a second subnetwork, a target patch of feature data from among the set of patches; generating, via a third subnetwork, gesture data by classifying the feature data of the target patch, the gesture data being indicative of the gesture of the sensor data; generating command data based on the gesture data; and controlling an actuator based on the command data, wherein, a machine learning model comprises the first subnetwork, the second subnetwork, and the third subnetwork; the first subnetwork includes a first set of artificial neural network layers, the first set of artificial neural network layers being convolutional neural network (CNN) layers; the second subnetwork includes a second set of artificial neural network layers, the second set of artificial neural network layers being CNN layers, recurrent neural network (RNN) layers, or a transformer neural network; and the third subnetwork includes a third set of artificial neural network layers, the third set of artificial neural network layers including CNN layers, RNN layers, or a transformer neural network. 14 . The non-transitory computer readable medium of claim 13 , wherein: the second subnetwork includes a binary classifier, the binary classifier classifying each patch of feature data as being indicativ

Assignees

Inventors

Classifications

  • using classification, e.g. of video objects · CPC title

  • G06V10/82Primary

    using neural networks · CPC title

  • G06V40/28Primary

    Recognition of hand or arm movements, e.g. recognition of deaf sign language (static hand signs G06V40/113) · CPC title

  • G06F3/017Primary

    Gesture based interaction, e.g. based on a set of recognized hand gestures (interaction based on gestures traced on a digitiser G06F3/04883) · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12510972B2 cover?
A computer-implemented system and method relate to gesture recognition. A machine learning model includes a first subnetwork, a second subnetwork, and a third subnetwork. The first subnetwork generates feature data based on sensor data, which includes a gesture. The feature data is divided into a set of patches. The second subnetwork selects a target patch of feature data from among the set of …
Who is the assignee on this patent?
Bosch Gmbh Robert
What technology area does this patent fall under?
Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 30 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).