Acoustic system and method based gesture detection using spiking neural networks

US11960654B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11960654-B2
Application numberUS-202218065713-A
CountryUS
Kind codeB2
Filing dateDec 14, 2022
Priority dateApr 9, 2022
Publication dateApr 16, 2024
Grant dateApr 16, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Conventional gesture detection approaches demand large memory and computation power to run efficiently, thus limiting their use in power and memory constrained edge devices. Present application/disclosure provides a Spiking Neural Network based system which is a robust low power edge compatible ultrasound-based gesture detection system. The system uses a plurality of speakers and microphones that mimics a Multi Input Multi Output (MIMO) setup thus providing requisite diversity to effectively address fading. The system also makes use of distinctive Channel Impulse Response (CIR) estimated by imposing sparsity prior for robust gesture detection. A multi-layer Convolutional Neural Network (CNN) has been trained on these distinctive CIR images and the trained CNN model is converted into an equivalent Spiking Neural Network (SNN) via an ANN (Artificial Neural Network)-to-SNN conversion mechanism. The SNN is further configured to detect/classify gestures performed by user(s).

First claim

Opening claim text (preview).

What is claimed is: 1. A processor implemented method, comprising: transmitting, via a plurality of speakers, a plurality of modulated signals to a user; receiving, via a plurality of microphones, a plurality of reflected signals from the user, in response to the plurality of transmitted modulated signals; processing, via a Channel Impulse Response (CIR) estimator, the plurality of transmitted modulated signals and the plurality of reflected signals using a sparsity prior serving as a constraint to obtain a plurality of CIR images; and recognizing, via a Spiking Neural Network (SNN), a gesture performed by the user based on the plurality of CIR images, wherein the Spiking Neural Network is obtained by: training a Convolutional Neural Network (CNN) using training data comprising a plurality of CIR images corresponding to one or more users to obtain a trained CNN; quantizing the trained CNN to obtain a quantized CNN; and converting the quantized CNN to the SNN, wherein the quantized CNN is converted to the SNN by performing an approximate matching of a corresponding output of an CNN neuron comprised in the CNN to a firing rate of a spiking neuron comprised in the SNN. 2. The processor implemented method of claim 1 , wherein the step of transmitting, via a plurality of speakers, the plurality of modulated signals to the user is preceded by: performing a logical operation on two pseudo random sequences obtained from a generator polynomial, to obtain a plurality of spreading sequence codes, wherein each of the two pseudo random sequences has a length of predefined symbols; interpolating the plurality of spreading sequence codes to obtain a plurality of interpolated sequences; filtering the plurality of interpolated sequences to obtain a plurality of filtered sequences; appending the plurality of filtered sequences with zeros to obtain a plurality of padded signals; and modulating the plurality of padded signals to obtain the plurality of modulated signals. 3. The processor implemented method of claim 2 , wherein the steps of filtering the plurality of interpolated sequences, appending the plurality of filtered sequences, and modulating the plurality of padded signals are performed such that each of the plurality of modulated signals obtained for transmission ranges between a first pre-defined acoustic transmission band and a second pre-defined acoustic transmission band. 4. The processor implemented method of claim 1 , wherein the step of receiving, via the plurality of microphones, the plurality of reflected signals from the user, in response to the plurality of transmitted modulated signals comprises: receiving, at the plurality of microphones, a plurality of signals based on the plurality of transmitted modulated signals; applying, at the plurality of microphones, a quadrature demodulation to the plurality of received signals to obtain a plurality of demodulated signals; and filtering, at the plurality of microphones, the plurality of demodulated signals to obtain the plurality of reflected signals. 5. The processor implemented method of claim 1 , wherein the step of processing, via the Channel Image Response (CIR) estimator, the plurality of transmitted modulated signals and the plurality of reflected signals using the sparsity prior serving as the constraint to obtain a plurality of CIR images comprises: estimating a plurality of CIR coefficients based on the plurality of transmitted modulated signals, and the plurality of reflected signals using the sparsity prior serving as the constraint; and concatenating the plurality of CIR coefficients to obtain the plurality of CIR images. 6. The processor implemented method of claim 1 , wherein the step of recognizing, via a Spiking Neural Network (SNN), a gesture performed by the user based on the plurality of CIR images comprises: converting the plurality of CIR images into a spike domain; extracting, one or more features of the spike-domain using one or more spiking neurons comprised in the SNN; and recognizing the gesture performed by the user from the extracted one or more features by using the SNN. 7. A system, comprising: a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: transmit, via a plurality of speakers, a plurality of modulated signals to a user; receive, via a plurality of microphones, a plurality of reflected signals from the user, in response to the plurality of transmitted modulated signals; process, via a Channel Impulse Response (CIR) estimator, the plurality of transmitted modulated signals and the plurality of reflected signals using a sparsity prior serving as a constraint to obtain a plurality of CIR images; and recognize, via a Spiking Neural Network (SNN), a gesture performed by the user based on the plurality of CIR images, wherein the Spiking Neural Network is obtained by: training a Convolutional Neural Network (CNN) using training data comprising a plurality of CIR images corresponding to one or more users to obtain a trained CNN; quantizing the trained CNN to obtain a quantized CNN; and converting the quantized CNN to the SNN, wherein the quantized CNN is converted to the SNN by performing an approximate matching of a corresponding output of an CNN neuron comprised in the CNN to a firing rate of a spiking neuron comprised in the SNN. 8. The system of claim 7 , wherein prior to transmitting, via the plurality of speakers, the plurality of modulated signals to the user, the one or more hardware processors are configured to: perform a logical operation on two pseudo random sequences obtained from a generator polynomial, to obtain a plurality of spreading sequence codes, wherein each of the two pseudo random sequences has a length of predefined symbols; interpolate the plurality of spreading sequence codes to obtain a plurality of interpolated sequences; filter the plurality of interpolated sequences to obtain a plurality of filtered sequences; append the plurality of filtered sequences with zeros to obtain a plurality of padded signals; and modulate the plurality of padded signals to obtain the plurality of modulated signals. 9. The system of claim 8 , wherein the plurality of interpolated sequences, the plurality of filtered sequences are appended, and the plurality of padded signals are modulated such that each of the plurality of modulated signals obtained for transmission ranges between a first pre-defined acoustic transmission band and a second pre-defined acoustic transmission band. 10. The system of claim 7 , wherein the plurality of reflected signals is obtained from the user, in response to the plurality of transmitted modulated signals by: receiving, at the plurality of microphones, a plurality of signals based on the plurality of transmitted modulated signals; applying, at the plurality of microphones, a quadrature demodulation to the plurality of received signals to obtain a plurality of demodulated signals; and filtering, at the plurality of microphones, the plurality of demodulated signals to obtain the plurality of reflected signals. 11. The system of claim 7 , wherein the plurality of transmitted modulated signals and the plurality of reflected signals are processed via the CIR estimator using the sparsity prior serving as the constraint to obtain the plurality of CIR images by: estimating a plurality of CIR coefficients based on the plurality of transmitted modulated signals, and the plurality of reflected signals using the sparsity prior serving as the con

Assignees

Inventors

Classifications

  • G06F3/017Primary

    Gesture based interaction, e.g. based on a set of recognized hand gestures (interaction based on gestures traced on a digitiser G06F3/04883) · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs · CPC title

  • G06V10/82Primary

    using neural networks · CPC title

  • Details of non-pulse systems {(short-range imaging G01S7/52017)} · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11960654B2 cover?
Conventional gesture detection approaches demand large memory and computation power to run efficiently, thus limiting their use in power and memory constrained edge devices. Present application/disclosure provides a Spiking Neural Network based system which is a robust low power edge compatible ultrasound-based gesture detection system. The system uses a plurality of speakers and microphones th…
Who is the assignee on this patent?
Tata Consultancy Services Ltd
What technology area does this patent fall under?
Primary CPC classification G06F3/017. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 16 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).