Few-shot gesture recognition method

US12205407B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12205407-B2
Application numberUS-202217940266-A
CountryUS
Kind codeB2
Filing dateSep 8, 2022
Priority dateMar 28, 2022
Publication dateJan 21, 2025
Grant dateJan 21, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed is a few-shot gesture recognition method. The method comprises the following steps: customizing, by a user, gesture categories, and acquiring few samples for each gesture category; inputting the acquired samples into a trained few-shot learning model, extracting a feature vector corresponding to each sample, and synthesizing feature vectors belonging to the same gesture to obtain an average feature vector corresponding to each gesture as a prototype vector; acquiring a corresponding sample for a target gesture implemented by the user, and inputting the sample into the few-shot learning model to obtain a feature vector of the target gesture as a query vector; and calculating similarities between the query vector and prototype vectors of different gestures, and selecting a gesture category corresponding to the prototype vector with the highest similarity as a prediction category of the target gesture.

First claim

Opening claim text (preview).

The invention claimed is: 1. A few-shot gesture recognition method, comprising the following steps: customizing, by a user, gesture categories and acquiring a plurality of samples for each gesture category; inputting the acquired samples into a trained few-shot learning model, extracting a feature vector corresponding to each sample, and synthesizing feature vectors belonging to the same gesture to obtain an average feature vector corresponding to each gesture as a prototype vector; acquiring a corresponding sample for a target gesture implemented by the user and inputting the sample into the few-shot learning model to obtain a feature vector of the target gesture as a query vector; and calculating the similarities between the query vector and prototype vectors of different gestures and selecting a gesture category corresponding to the prototype vector with the highest similarity as a prediction category of the target gesture; wherein the few-shot learning model comprises a feature extractor and a distance classifier, the prototype vector and the query vector are obtained by using the feature extractor, and the similarity between the query vector and the prototype vector is obtained by calculation using the distance classifier; wherein the few-shot learning model is trained according to the following steps: pre-training a basic classification model by using a first training dataset, wherein the first training dataset reflects a corresponding relation between the samples acquired when the user implements the gestures and the gesture categories, and the basic classification model comprises a first feature extraction module and a multilayer perceptron classifier; and training the few-shot learning model by using a second training dataset and freezing the pre-trained basic classification model in the training process; wherein the few-shot learning model is constructed according to the following steps: adding an encoder, an adaptive network and a conversion layer on a pre-trained basic classification model, wherein the encoder is used for coding input data to obtain a coded vector; the self-adaptive network takes the coded vector as an input and outputs a parameter vector representing linear transformation; the conversion layer performs linear transformation on a convolutional layer result in the basic classification model based on the parameter vector; and replacing the multilayer perceptron classifier of the basic classification model with the distance classifier to construct the few-shot learning model. 2. The method according to claim 1 , wherein in the process of training the few-shot learning model, a small number of samples of partial categories are randomly selected from the second training dataset, and different data of the same category are randomly selected to be combined into a task; an average value of feature vectors obtained after the small number of samples pass through the few-shot learning model is taken as a prototype vector, a feature vector obtained after other data pass through the few-shot learning model is taken as a query vector, a category prediction result is further obtained after the prototype vector and the query vector pass through the distance classifier, and a training loss is calculated with a real label of the query vector. 3. The method according to claim 1 , wherein the distance classifier is an L1 distance classifier or an L2 distance classifier. 4. The method according to claim 1 , wherein the first training dataset and the second training dataset are obtained according to the following steps: controlling a built-in loudspeaker of an intelligent device to emit a specific frequency sound wave signal modulated according to a certain modulation mode and controlling a built-in microphone of the intelligent device to receive an echo signal at a certain sampling frequency; implementing, by the user, a user predefined gesture in a first azimuth angle with the intelligent device at any speed and any size in an area near the intelligent device, and acquiring a first dataset; implementing, by the user, a gesture in a second azimuth angle formed with the intelligent device, and acquiring a second dataset, wherein the first azimuth angle is different from the second azimuth angle, and the first dataset and the second dataset each comprise a plurality of acquired one-dimensional sound signal samples; and preprocessing the first dataset and the second dataset to convert a one-dimensional sound signal sequence into a two-dimensional time-frequency spectrogram, and further constructing into the first training dataset and the second training dataset, wherein the first training dataset and the second training dataset reflect a corresponding relation between the two-dimensional time-frequency spectrograms and the gesture categories, and the two-dimensional time-frequency spectrograms reflects time-frequency characteristics between a gesture starting frame and a gesture ending frame. 5. The method according to claim 4 , wherein in case of a user implementing a plurality of gestures in succession, each gesture is detected to extract a two-dimensional time-frequency spectrogram corresponding to each gesture according to the following steps: scanning each frequency bin from a low frequency to a high frequency for each frame in the time-frequency spectrogram, determining the frame as an active frame when an energy of the consecutive frequency bins more than a set threshold is higher than a set energy threshold, and further finding the gesture starting frame and the gesture ending frame to extract two-dimensional time-frequency spectrograms corresponding to different gestures. 6. The method according to claim 1 , wherein the encoder is used for performing spectrogram convolution on the input data, then performing dimension reduction on a feature map obtained after the convolution, and further obtaining the coded vector. 7. The method according to claim 1 , wherein the basic classification model comprises a plurality of residual blocks, each residual block comprises a plurality of convolutional layers, the coded vector output by the encoder is transmitted to the adaptive network, and the parameter vector output by the adaptive network comprises a stretching factor and a translation factor and is provided to the conversion layer. 8. A non-transitory computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the steps of the method according to claim 1 are implemented. 9. A computer device, comprising a memory and a processor, wherein a computer program capable of operating on the processor is stored on the memory, and when the processor executes the computer program, the steps of the method according to claim 1 are implemented.

Assignees

Inventors

Classifications

  • using classification, e.g. of video objects · CPC title

  • Image or video pattern matching; Proximity measures in feature spaces · CPC title

  • Combinations of networks · CPC title

  • Learning methods · CPC title

  • G06V40/20Primary

    Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12205407B2 cover?
Disclosed is a few-shot gesture recognition method. The method comprises the following steps: customizing, by a user, gesture categories, and acquiring few samples for each gesture category; inputting the acquired samples into a trained few-shot learning model, extracting a feature vector corresponding to each sample, and synthesizing feature vectors belonging to the same gesture to obtain an a…
Who is the assignee on this patent?
Univ Shenzhen
What technology area does this patent fall under?
Primary CPC classification G06V40/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 21 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).