System and method for deep learning based hand gesture recognition in first person view

US10429944B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10429944-B2
Application numberUS-201816020245-A
CountryUS
Kind codeB2
Filing dateJun 27, 2018
Priority dateOct 7, 2017
Publication dateOct 1, 2019
Grant dateOct 1, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

This disclosure relates generally to hand-gesture recognition, and more particularly to system and method for detecting interaction of 3D dynamic hand gestures with frugal AR devices. In one embodiment, a method for hand-gesture recognition includes receiving frames of a media stream of a scene captured from a FPV of a user using RGB sensor communicably coupled to a wearable AR device. The media stream includes RGB image data associated with the frames of the scene. The scene comprises a dynamic hand gesture performed by the user. Temporal information associated with the dynamic hand gesture is estimated from the RGB image data by using a deep learning model. The estimated temporal information is associated with hand poses of the user and comprises key-points identified on user's hand in the frames. Based on said temporal information, the dynamic hand gesture is classified into predefined gesture classes by using multi-layered LSTM classification network.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor-implemented method for hand-gesture recognition, the method comprising: receiving, via one or more hardware processors, a plurality of frames of a media stream of a scene captured from a first person view (FPV) of a user using at least one RGB sensor communicably coupled to a wearable Augmented reality (AR) device, the media stream comprising RGB image data associated with the plurality of frames of the scene, the scene comprising a dynamic hand gesture performed by the user; estimating, via the one or more hardware processors, a temporal information associated with the dynamic hand gesture from the RGB image data by using a deep learning model, the estimated temporal information being associated with hand poses of the user and comprising a plurality of key-points identified on user's hand in the plurality of frames, wherein the plurality of key-points comprises twenty one hand key-points, and wherein each key-point of the twenty one key points comprises four key points per finger and one key-point close to wrist of the user's hand, and wherein estimating the temporal information associated with the dynamic hand gesture comprises: estimating, a plurality of network-implicit 3D articulation priors using the deep learning model, the plurality of network-implicit 3D articulation priors comprising a plurality of key-points determined from a plurality of training sample RGB images of user's hand; and detecting, based on the plurality of network-implicit 3D articulation priors, the plurality of key-points on the user's hand in the plurality of frames; and classifying, by using a multi-layered Long Short Term memory (LSTM) classification network, the dynamic hand gesture into at least one predefined gesture class based on the temporal information associated with the plurality of key points, via the one or more hardware processors. 2. The method of claim 1 , further comprising downscaling the plurality of frames upon capturing the media stream. 3. The method of claim 1 , wherein the multi-layered LSTM classification network comprises: a first layer comprising a LSTM layer consisting of a plurality of LSTM cells to learn long-term dependencies and patterns in a 3D coordinates sequence of the plurality of key-points detected on the user's hand; a second layer comprising a flattening layer that makes the temporal data one-dimensional; and a third layer comprising a fully connected layer with output scores corresponding to each of the dynamic hand gestures, the output scores indicative of posterior probability corresponding to the each of the dynamic hand gestures for classification in the at least one predefined gesture class. 4. The method of claim 3 , further comprising testing the LSTM classification network for classifying the dynamic hand gesture from amongst the plurality of dynamic hand gestures, wherein testing the LSTM classification network comprises: interpreting, by using a softmax activation function, output scores as unnormalized log probabilities and squashing the output scores to be between 0 and 1 using the following equation: σ ⁡ ( s ) j = e s j ∑ k = 0 K - 1 ⁢ e s k where, K denotes number of classes, s is a K×1 vector of scores, an input to softmax function, and j is an index varying from 0 to K−1, and σ(s) is K×1 output vector denoting the posterior probabilities associated with each of the plurality of dynamic hand gestures. 5. The method of claim 3 , further comprising training the LSTM classification network, wherein training the LSTM classification network comprises: computing cross-entropy loss Li of ith training sample of the plurality of training sample RGB images by using following equation: L i =−h j *log(σ( s ) j ) where h is a 1×K vector denoting one-hot label of input comprising the plurality of training sample RGB images; and computing a mean of L i over the plurality of training sample images and propagating back in the LSTM classification network to fine tune the LSTM classification network in the training. 6. The method of claim 1 , wherein upon classifying the 3D dynamic hand gesture into the at least one predefined gesture class, communicating the classified at least one predefined gesture class to a at least one of a device embodying the at least one RGB sensor and the wearable AR device, and enabling the device to trigger a pre-defined task. 7. A system for hand-gesture recognition, the system comprising: one or more memories; and one or more hardware processors, the one or more memories coupled to the one or more hardware processors, wherein the one or more hardware processors are capable of executing programmed instructions stored in the one or more memories to: receive a plurality of frames of a media stream of a scene captured from a first person view (FPV) of a user using at least one RGB sensor communicably coupled to a wearable AR device, the media stream comprising RGB image data associated with the plurality of frames of the scene, the scene comprising a dynamic hand gesture performed by the user; estimate a temporal information associated with the dynamic hand gesture from the RGB image data by using a deep learning model, the estimated temporal information being associated with hand poses of the user and comprising a plurality of key-points identified on user's hand in the plurality of frames; wherein the plurality of key-points comprises twenty one hand key-points, and wherein each key-point of the twenty one key points comprises four key points per finger and one key-point close to wrist of the user's hand, and wherein estimating the temporal information associated with the dynamic hand gesture comprises: estimating, a plurality of network-implicit 3D articulation priors using the deep learning model, the plurality of network-implicit 3D articulation priors comprising a plurality of key-points determined from a plurality of training sample RGB images of user's hand; and detecting, based on the plurality of network-implicit 3D articulation priors, the plurality of key-points on the user's hand in the plurality of frames; and classify, by using a multi-layered LSTM classification network, the dynamic hand gesture into at least one predefined gesture class based on the temporal information associated with the plurality of key points. 8. The system of claim 7 , wherein the one or more hardware processors are further configured by the instructions to downscale the plurality of frames upon capturing the media stream. 9. The system of claim 8 , wherein the multi-layered LSTM classification network comprises: a first layer comprising a LSTM layer consisting of a plurality of LSTM cells to learn long-term dependencies and patterns

Assignees

Inventors

Classifications

  • G06V10/82Primary

    using neural networks · CPC title

  • G06F3/017Primary

    Gesture based interaction, e.g. based on a set of recognized hand gestures (interaction based on gestures traced on a digitiser G06F3/04883) · CPC title

  • Classification techniques · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Arrangements for interaction with the human body, e.g. for user immersion in virtual reality (blind teaching G09B21/00) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10429944B2 cover?
This disclosure relates generally to hand-gesture recognition, and more particularly to system and method for detecting interaction of 3D dynamic hand gestures with frugal AR devices. In one embodiment, a method for hand-gesture recognition includes receiving frames of a media stream of a scene captured from a FPV of a user using RGB sensor communicably coupled to a wearable AR device. The medi…
Who is the assignee on this patent?
Tata Consultancy Services Ltd
What technology area does this patent fall under?
Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 01 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).