What technology area does this patent fall under?

Primary CPC classification G10L15/02. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 03 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Detecting keywords in audio using a spiking neural network

US10403266B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10403266-B2
Application number	US-201715786803-A
Country	US
Kind code	B2
Filing date	Oct 18, 2017
Priority date	Oct 18, 2017
Publication date	Sep 3, 2019
Grant date	Sep 3, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An example apparatus for detecting keywords in audio includes an audio receiver to receive audio comprising a keyword to be detected. The apparatus also includes a spike transducer to convert the audio into a plurality of spikes. The apparatus further includes a spiking neural network to receive one or more of the spikes and generate a spike corresponding to a detected keyword.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus for detecting keywords in audio, comprising: an audio receiver to receive audio comprising a keyword to be detected; a spike transducer to transduce the audio into a plurality of spikes, the spike transducer to convert the audio into the plurality of spikes using a sample window width based on a duration of a key-phrase that comprises a duration that is based on a longest key phrase to be detected; and a spiking neural network to receive one or more of the spikes and generate a spike corresponding to a detected keyword. 2. The apparatus of claim 1 , comprising a feature generator to generate a plurality of features based on the audio, the features to be converted into spikes by the spike transducer. 3. The apparatus of claim 2 , wherein the features comprise audio parameters. 4. The apparatus of claim 2 , wherein a sliding sample window step size of the feature generator is a function of a feature step size. 5. The apparatus of claim 1 , wherein the spiking neural network is to enter an idle mode in response to generating the spike. 6. The apparatus of claim 1 , wherein the spiking neural network is trained using training spikes generated from training audio samples. 7. The apparatus of claim 1 , wherein the spiking neural network comprises a sparsely active network. 8. The apparatus of claim 1 , wherein the spiking neural network comprises an output layer comprising a number of neurons based on a number of trained keywords to be detected. 9. The apparatus of claim 1 , wherein the spike corresponding to the detected keyword is output using acoustic scoring and a decision state machine, wherein the acoustic scoring comprises mapping generated feature vectors to senomes using a spiking neural network. 10. A method for detecting keywords in audio, comprising: receiving, via a processor, audio comprising a keyword to be detected; transducing, via the processor, the audio into a plurality of spikes, wherein transducing the audio into spikes comprises generating a plurality of features based on the audio, and transducing the features into the spikes, wherein generating the plurality of features comprises computing mel-frequency cepstral coefficients based on a predetermined sliding sample window size and sliding sample window step size and concatenating the mel-frequency cepstral coefficients based on a duration of a keyword that is based on a longest key phrase to be detected; sending, to a spiking neural network, one or more of the spikes; and receiving, from the spiking neural network, a spike corresponding to a detected keyword. 11. The method of claim 10 , wherein generating the plurality of features comprises computing linear predictive coding (LPC) features. 12. The method of claim 10 , comprising training the spiking neural network, wherein training the spiking neural network comprises: receiving, via the processor, audio comprising keywords to be trained to be detected; converting, via the processor, the audio into training spikes; and training, via the processor, the spiking neural network using the training spikes. 13. The method of claim 10 , comprising sending the detected keyword to an application. 14. The method of claim 10 , wherein the spiking neural network comprises a sparsely activated network that is activated in response to receiving the one or more spikes from the processor. 15. The method of claim 10 , comprising activating an idle mode in response to generating the spike corresponding to the detected keyword. 16. The method of claim 10 , wherein transducing the audio into the plurality of spikes comprises generating a matrix of features over a predetermined number of frames and transducing the matrix of features into the plurality of spikes. 17. The method of claim 10 , wherein transducing the audio into the plurality of spikes comprises flattening the features into an ordered set of features based on intensity. 18. At least one non-transitory computer readable medium for detecting keywords in audio having instructions stored therein that, in response to being executed on a computing device, cause the computing device to: receive audio comprising a keyword to be detected; transduce the audio into a plurality of spikes, converting the audio into the plurality of spikes using a sample window width based on a duration of a key-phrase that comprises a duration that is based on a longest key phrase to be detected; and generate a spike corresponding to a detected keyword based on one or more of the plurality of spikes. 19. The at least one non-transitory computer readable medium of claim 18 , comprising instructions to convert the audio into a plurality of features, and transduce the features into the plurality of spikes. 20. The at least one non-transitory computer readable medium of claim 18 , comprising instructions to enter an idle mode in response to generating the spike corresponding to the detected keyword. 21. The at least one non-transitory computer readable medium of claim 18 , comprising instructions to generate a matrix of features over a predetermined number of frames, flatten the features into an ordered set of features based on intensity, and transduce the ordered set of features into the plurality of spikes. 22. The at least one non-transitory computer readable medium of claim 18 , comprising instructions to train a spiking neural network to translate a feature to senones using an acoustic model based scoring.

Assignees

Intel Corp

Inventors

Classifications

G06N3/049
Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs · CPC title
G06N3/084
Backpropagation, e.g. using gradient descent · CPC title
G10L2015/0635
updating or merging of old and new templates; Mean values; Weighting · CPC title
G10L25/24
the extracted parameters being the cepstrum · CPC title
G10L15/063
Training · CPC title

Patent family

Related publications grouped by family.

View patent family 66096064

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10403266B2 cover?: An example apparatus for detecting keywords in audio includes an audio receiver to receive audio comprising a keyword to be detected. The apparatus also includes a spike transducer to convert the audio into a plurality of spikes. The apparatus further includes a spiking neural network to receive one or more of the spikes and generate a spike corresponding to a detected keyword.
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification G10L15/02. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 03 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).