Multi-sensor based user interface
US-2017060254-A1 · Mar 2, 2017 · US
US10528147B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10528147-B2 |
| Application number | US-201715640327-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 30, 2017 |
| Priority date | Mar 6, 2017 |
| Publication date | Jan 7, 2020 |
| Grant date | Jan 7, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An ultrasonic gesture recognition system is provided that recognizes gestures based on analysis of return signals of an ultrasonic pulse that is reflected from a gesture. The system transmits an ultrasonic chirp and samples a microphone array at sample intervals to collect a return signal for each microphone. The system then applies a beamforming technique to frequency domain representations of the return signals to generate an acoustic image with a beamformed return signal for multiple directions. The system then generates a feature image from the acoustic images to identify, for example, distance or depth from the microphone array to the gesture for each direction. The system then submits the feature image to a deep learning system to classify the gesture.
Opening claim text (preview).
We claim: 1. A method performed by a computing device for recognizing a gesture, the method comprising: transmitting via a transmitter an ultrasonic chirp to receive time domain return signals from the gesture; for each of a plurality of receivers, receiving the time domain return signal by sampling that receiver at sample intervals; and converting the time domain return signal to a frequency domain return signal; generating an acoustic image with a beamformed frequency return signal for each of a plurality of directions, the generating of the acoustic image including performing a beamforming of the frequency domain return signals to generate a beamformed frequency domain return signal for each direction; generating a feature image for a feature with a feature value for each direction from the acoustic image; and submitting the feature image to a classifier to classify the gesture, wherein the classifier is a convolutional neural network (“CNN”) including multiple convolution layers, a fully connected layer, a long short-term memory (“LSTM”) layer, a softmax layer, and a mean pooling layer. 2. The method of claim 1 wherein multiple chirps are transmitted in sequence and a sequence of feature images is generated with one feature image for each chirp and wherein the submitting includes submitting the sequence of feature images to the classifier to classify a dynamic gesture. 3. The method of claim 1 wherein multiple sequences of feature images are generated for multiple features and the classifier includes a convolution layer for each feature, and wherein the fully connected layer is fully connected to the convolution layers. 4. The method of claim 1 wherein the feature is selected from a group consisting of depth and intensity. 5. The method of claim 1 wherein the beamforming uses a noise covariance matrix for the receivers that was generated prior to the transmitting of the chirp. 6. The method of claim 1 wherein the generating of the acoustic image includes performing a match filtering of the beamformed frequency domain return signals and the transmitted chirp. 7. A computing system for recognizing a gesture, the computing system comprising: one or more computer-readable storage mediums storing computer-executable instructions of: a component that accesses, for each of a plurality of ultrasonic pulses, time domain return signals that are reflections from the gesture of that transmitted ultrasonic pulse, each time domain return signal having been received by a receiver; a beamforming component that, for each ultrasonic pulse and each direction, performs beamforming to generate, from frequency domain return signals corresponding to the time domain return signals, a beamformed frequency domain return signal for that pulse and direction; a feature extraction component that, for each ultrasonic pulse, extracts from the beamformed frequency domain return signals a feature value for a feature for each direction; and a classifier component that receives the feature values for each ultrasonic pulse, recognizes the gesture based on the feature values, and outputs an indication of the recognized gesture, wherein the classifier component implements a convolutional neural network (“CNN”) including multiple convolution layers, a fully connected layer, a long short-term memory (“LSTM”) layer, a softmax layer, and a mean pooling layer; and one or more processors for executing the computer-executable instructions stored in the one or more computer-readable storage mediums. 8. The computing system of claim 7 wherein the feature extraction component extracts feature values for multiple features and the CNN includes a convolution layer for each feature, and wherein the fully connected layer is fully connected to the convolution layers. 9. The computing system of claim 7 wherein the frequency of the ultrasonic pulse varies. 10. A method performed by a computing system for recognizing a gesture, the method comprising: transmitting an ultrasonic pulse; receiving, at each of a plurality of receivers, a return signal from the ultrasonic pulse; performing beamforming based on the return signals to generate a beamformed return signal for each of a plurality of directions; generating a feature value for each beamformed return signal; and applying a classifier to the feature values to classify the gesture, wherein the classifier component implements a convolutional neural network (“CNN”) including multiple convolution layers, a fully connected layer, a long short-term memory (“LSTM”) layer, a softmax layer, and a mean pooling layer. 11. The method of claim 10 wherein the classifier is performed by a deep learning system. 12. A deep learning system for recognizing a gesture from feature images generated from return signals of ultrasonic pulses reflected from the gesture, the deep learning system comprising: a first convolution layer that inputs feature images in sequence and outputs generated first features for each feature image, the first convolution layer including a first convolution sublayer, a first rectified linear unit (“ReLU”) sublayer, and a first max pooling sublayer; a second convolution layer that inputs the first features in sequence and outputs generated second features for each feature image, the second convolution layer including a second convolution sublayer, a second ReLU sublayer, and a second max pooling layer; a fully connected layer that inputs the second features in sequence and outputs generated third features for each feature image; a long short-term memory layer that inputs the third features in sequence and outputs fourth features for each feature image; a softmax layer that inputs the fourth features in sequence and outputs probabilities of classifications for the sequence of feature images; and a max pooling layer that inputs probabilities of the classification and outputs an indication of the classification for the sequence of feature images.
Gesture based interaction, e.g. based on a set of recognized hand gestures (interaction based on gestures traced on a digitiser G06F3/04883) · CPC title
Extracting wanted echo signals {(Doppler systems G01S15/50)} · CPC title
Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title
using neural networks · CPC title
using classification, e.g. of video objects · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.