Detecting keywords in audio using a spiking neural network

US10403266B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10403266-B2
Application numberUS-201715786803-A
CountryUS
Kind codeB2
Filing dateOct 18, 2017
Priority dateOct 18, 2017
Publication dateSep 3, 2019
Grant dateSep 3, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An example apparatus for detecting keywords in audio includes an audio receiver to receive audio comprising a keyword to be detected. The apparatus also includes a spike transducer to convert the audio into a plurality of spikes. The apparatus further includes a spiking neural network to receive one or more of the spikes and generate a spike corresponding to a detected keyword.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus for detecting keywords in audio, comprising: an audio receiver to receive audio comprising a keyword to be detected; a spike transducer to transduce the audio into a plurality of spikes, the spike transducer to convert the audio into the plurality of spikes using a sample window width based on a duration of a key-phrase that comprises a duration that is based on a longest key phrase to be detected; and a spiking neural network to receive one or more of the spikes and generate a spike corresponding to a detected keyword. 2. The apparatus of claim 1 , comprising a feature generator to generate a plurality of features based on the audio, the features to be converted into spikes by the spike transducer. 3. The apparatus of claim 2 , wherein the features comprise audio parameters. 4. The apparatus of claim 2 , wherein a sliding sample window step size of the feature generator is a function of a feature step size. 5. The apparatus of claim 1 , wherein the spiking neural network is to enter an idle mode in response to generating the spike. 6. The apparatus of claim 1 , wherein the spiking neural network is trained using training spikes generated from training audio samples. 7. The apparatus of claim 1 , wherein the spiking neural network comprises a sparsely active network. 8. The apparatus of claim 1 , wherein the spiking neural network comprises an output layer comprising a number of neurons based on a number of trained keywords to be detected. 9. The apparatus of claim 1 , wherein the spike corresponding to the detected keyword is output using acoustic scoring and a decision state machine, wherein the acoustic scoring comprises mapping generated feature vectors to senomes using a spiking neural network. 10. A method for detecting keywords in audio, comprising: receiving, via a processor, audio comprising a keyword to be detected; transducing, via the processor, the audio into a plurality of spikes, wherein transducing the audio into spikes comprises generating a plurality of features based on the audio, and transducing the features into the spikes, wherein generating the plurality of features comprises computing mel-frequency cepstral coefficients based on a predetermined sliding sample window size and sliding sample window step size and concatenating the mel-frequency cepstral coefficients based on a duration of a keyword that is based on a longest key phrase to be detected; sending, to a spiking neural network, one or more of the spikes; and receiving, from the spiking neural network, a spike corresponding to a detected keyword. 11. The method of claim 10 , wherein generating the plurality of features comprises computing linear predictive coding (LPC) features. 12. The method of claim 10 , comprising training the spiking neural network, wherein training the spiking neural network comprises: receiving, via the processor, audio comprising keywords to be trained to be detected; converting, via the processor, the audio into training spikes; and training, via the processor, the spiking neural network using the training spikes. 13. The method of claim 10 , comprising sending the detected keyword to an application. 14. The method of claim 10 , wherein the spiking neural network comprises a sparsely activated network that is activated in response to receiving the one or more spikes from the processor. 15. The method of claim 10 , comprising activating an idle mode in response to generating the spike corresponding to the detected keyword. 16. The method of claim 10 , wherein transducing the audio into the plurality of spikes comprises generating a matrix of features over a predetermined number of frames and transducing the matrix of features into the plurality of spikes. 17. The method of claim 10 , wherein transducing the audio into the plurality of spikes comprises flattening the features into an ordered set of features based on intensity. 18. At least one non-transitory computer readable medium for detecting keywords in audio having instructions stored therein that, in response to being executed on a computing device, cause the computing device to: receive audio comprising a keyword to be detected; transduce the audio into a plurality of spikes, converting the audio into the plurality of spikes using a sample window width based on a duration of a key-phrase that comprises a duration that is based on a longest key phrase to be detected; and generate a spike corresponding to a detected keyword based on one or more of the plurality of spikes. 19. The at least one non-transitory computer readable medium of claim 18 , comprising instructions to convert the audio into a plurality of features, and transduce the features into the plurality of spikes. 20. The at least one non-transitory computer readable medium of claim 18 , comprising instructions to enter an idle mode in response to generating the spike corresponding to the detected keyword. 21. The at least one non-transitory computer readable medium of claim 18 , comprising instructions to generate a matrix of features over a predetermined number of frames, flatten the features into an ordered set of features based on intensity, and transduce the ordered set of features into the plurality of spikes. 22. The at least one non-transitory computer readable medium of claim 18 , comprising instructions to train a spiking neural network to translate a feature to senones using an acoustic model based scoring.

Assignees

Inventors

Classifications

  • Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

  • updating or merging of old and new templates; Mean values; Weighting · CPC title

  • the extracted parameters being the cepstrum · CPC title

  • Training · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10403266B2 cover?
An example apparatus for detecting keywords in audio includes an audio receiver to receive audio comprising a keyword to be detected. The apparatus also includes a spike transducer to convert the audio into a plurality of spikes. The apparatus further includes a spiking neural network to receive one or more of the spikes and generate a spike corresponding to a detected keyword.
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G10L15/02. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 03 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).