Sectioned memory networks for online word-spotting in continuous speech

US9570069B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9570069-B2
Application numberUS-201414481372-A
CountryUS
Kind codeB2
Filing dateSep 9, 2014
Priority dateSep 9, 2014
Publication dateFeb 14, 2017
Grant dateFeb 14, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and computer program products to detect a keyword in speech, by generating, from a sequence of spectral feature vectors generated from the speech, a plurality of blocked feature vector sequences, and analyzing, by a neural network, each of the plurality of blocked feature vector sequences to detect the presence of the keyword in the speech.

First claim

Opening claim text (preview).

What is claimed is: 1. A method to detect a keyword in speech, comprising: generating, from a sequence of spectral feature vectors generated from the speech, a plurality of blocked feature vector sequences, wherein each of the plurality of blocked feature vector sequences comprises a respective subset of the plurality of feature vectors, wherein each of the plurality of blocked feature vector sequences overlap adjacent blocked feature vector sequences by a predefined count of feature vector sequences; and analyzing, by a neural network executing on one or more computer processors, each of the plurality of blocked feature vector sequences to detect the presence of the keyword in the speech. 2. The method of claim 1 , further comprising: prior to generating the blocked feature vector sequences: receiving a speech signal comprising the speech; and performing a feature computation on the speech signal to generate the sequence of spectral feature vectors. 3. The method of claim 1 , wherein an output of the neural network comprises a sequence of labels, wherein each of the sequence of labels indicates whether the keyword is present in a corresponding blocked feature vector sequence, the method further comprising: smoothing the sequence of labels to determine whether the keyword is in the speech. 4. The method of claim 1 , wherein the neural network comprises a plurality of blocks, wherein each block of the neural network is configured to process a respective block of the plurality of blocked feature vector sequences. 5. The method of claim 1 , wherein a count of the subset of the plurality of feature vectors in each block of the plurality of blocked feature vector sequences is defined during a training phase of the neural network, the method further comprising: modifying the count based on a feature in the spectral feature vector, wherein the modified count is configured to increase a likelihood that the neural network will correctly detect the presence of the keyword in the speech. 6. The method of claim 1 , wherein the neural network generates an output comprising an indication of the presence of keywords, and not phonemes, in the speech, wherein the method detects the keyword in the speech without using a hidden Markov model. 7. The method of claim 1 , further comprising: upon detecting the presence of the keyword in the speech, returning an indication that the keyword is present in the speech. 8. A computer program product to detect a keyword in speech, the computer program product comprising: a non-transitory computer-readable storage medium having computer-readable program code embodied therewith, the computer-readable program code executable by a processor to perform an operation comprising: generating, from a sequence of spectral feature vectors of the speech, a plurality of blocked feature vector sequences, wherein each of the plurality of blocked feature vector sequences comprises a respective subset of the plurality of feature vectors, wherein each of the plurality of blocked feature vector sequences overlap adjacent blocked feature vector sequences by a predefined count of feature vector sequences; and generating, by a neural network, the plurality of blocked feature vector sequences to detect the presence of the keyword in the speech. 9. The computer program product of claim 8 , the operation further comprising prior to generating the blocked feature vector sequences: receiving a speech signal comprising the speech; and performing a feature computation on the speech signal to generate the sequence of spectral feature vectors. 10. The computer program product of claim 8 , wherein an output of the neural network comprises a sequence of labels, wherein each of the sequence of labels indicates whether the keyword is present in a corresponding blocked feature vector sequence, the operation further comprising: smoothing the sequence of labels to determine whether the keyword is in the speech. 11. The computer program product of claim 8 , wherein the neural network comprises a plurality of blocks, wherein each block of the neural network is configured to process a respective block of the plurality of blocked feature vector sequences. 12. The computer program product of claim 8 , wherein a count of the subset of the plurality of feature vectors in each block of the plurality of blocked feature vector sequences is defined during a training phase of the neural network, the operation further comprising: modifying the count based on a feature in the spectral feature vector, wherein the modified count is configured to increase a likelihood that the neural network will correctly detect the presence of the keyword in the speech. 13. The computer program product of claim 8 , wherein the neural network generates an output comprising an indication of the presence of keywords, and not phonemes, in the speech, wherein the method detects the keyword in the speech without using a hidden Markov model. 14. A system, comprising: a computer processor; and a memory containing a program, which when executed by the computer processor, performs an operation to detect a keyword in speech, the operation comprising: generating, from a sequence of spectral feature vectors generated from the speech, a plurality of blocked feature vector sequences, wherein each of the plurality of blocked feature vector sequences comprises a respective subset of the plurality of feature vectors, wherein each of the plurality of blocked feature vector sequences overlap adjacent blocked feature vector sequences by a predefined count of feature vector sequences; and analyzing, by a neural network, each of the plurality of blocked feature vector sequences to detect the presence of the keyword in the speech. 15. The system of claim 14 , the operation further comprising: prior to generating the blocked feature vector sequences: receiving a speech signal comprising the speech; and performing a feature computation on the speech signal to generate the sequence of spectral feature vectors. 16. The system of claim 14 , wherein an output of the neural network comprises a sequence of labels, wherein each of the sequence of labels indicates whether the keyword is present in a corresponding blocked feature vector sequence, the operation further comprising: smoothing the sequence of labels to determine whether the keyword is in the speech. 17. The system of claim 14 , wherein the neural network comprises a plurality of blocks, wherein each block of the neural network is configured to process a respective block of the plurality of blocked feature vector sequences. 18. The system of claim 17 , wherein a count of the subset of the plurality of feature vectors in each block of the plurality of blocked feature vector sequences is defined during a training phase of the neural network, wherein the neural network generates an output comprising an indication of the presence of keywords, and not phonemes, in the speech, wherein the method detects the keyword in the speech without using a hidden Markov model, the operation further comprising: modifying the count based on a feature in the spectral feature vector, wherein the modified count is configured to increase a likelihood that the neural network will correctly detect the presence of the keyword in the speech. 19. A method to detect a keyword in speech, comprising: generating, from a sequence of spectral feature vectors generated from the speech, a plurality of blocked feature vector sequences; generating, by a neu

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9570069B2 cover?
Systems, methods, and computer program products to detect a keyword in speech, by generating, from a sequence of spectral feature vectors generated from the speech, a plurality of blocked feature vector sequences, and analyzing, by a neural network, each of the plurality of blocked feature vector sequences to detect the presence of the keyword in the speech.
Who is the assignee on this patent?
Disney Entpr Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 14 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).