What technology area does this patent fall under?

Primary CPC classification G10L15/16. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 23 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Learning front-end speech recognition parameters within neural network training

US10360901B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10360901-B2
Application number	US-201414561811-A
Country	US
Kind code	B2
Filing date	Dec 5, 2014
Priority date	Dec 6, 2013
Publication date	Jul 23, 2019
Grant date	Jul 23, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for learning front-end speech recognition parameters as part of training a neural network classifier include obtaining an input speech signal, and applying front-end speech recognition parameters to extract features from the input speech signal. The extracted features may be fed through a neural network to obtain an output classification for the input speech signal, and an error measure may be computed for the output classification through comparison of the output classification with a known target classification. Back propagation may be applied to adjust one or more of the front-end parameters as one or more layers of the neural network, based on the error measure.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: training an acoustic model of a speech recognition engine using input speech to identify features from the input speech and to classify the input speech based on identified features, wherein training the acoustic model comprises: training a neural network of the speech recognition engine using the input speech to recognize features from the input speech, wherein training the neural network to recognize the features comprises training a filter bank of the neural network using the input speech, wherein the filter bank of the neural network is inside the neural network, and wherein the filter bank is configured to take a speech frame power spectrum as input; and jointly with training the filter bank of the neural network, training a classifier of the neural network using the input speech to classify the input speech based on features generated using output of the filter bank, wherein jointly training the filter bank of the neural network and the classifier of the neural network comprises: processing a frame of the input speech to produce a power spectrum for the frame of the input speech; and passing the power spectrum through the filter bank to create filter bank output. 2. The method of claim 1 , wherein: the neural network comprises at least two layers; and the filter bank is at least a part of one of the at least two layers. 3. The method of claim 2 , wherein the filter bank comprises a plurality of weights and the plurality of weights form at least a part of a layer of the at least two layers of the neural network, and wherein each weight of the plurality of weights operates on a subset of frequency components of the power spectrum, and wherein jointly training the filter bank of the neural network and the classifier of the neural network comprises: generating a set of features of the frame based on the filter bank output; determining, using the classifier of the neural network, a classification for the frame based on the set of features; computing an error measure by comparing the classification to a target classification for the frame; and training the filter bank and the classifier based on the error measure, wherein training the filter bank comprises adjusting the plurality of weights of the neural network based on the error measure. 4. The method of claim 3 , wherein adjusting the plurality of weights comprises adjusting the plurality of weights using back propagation of the error measure between the classification and the target classification. 5. The method of claim 1 , wherein: processing the frame of the input speech to produce a power spectrum for the frame comprises producing a normalized power spectrum for the frame; and jointly training the filter bank and the classifier of the neural network using the input speech comprises jointly training the filter bank and the classifier of the neural network based at least in part on the normalized power spectrum. 6. The method of claim 5 , wherein processing the frame of the input speech to produce the normalized power spectrum for the frame comprises: processing the frame to produce a power spectrum for the frame; performing a non-linear transformation on the power spectrum to produce a non-linear power spectrum; normalizing the non-linear power spectrum to produce a normalized non-linear power spectrum; and performing a second transformation on the normalized non-linear power spectrum to produce the normalized power spectrum. 7. The method of claim 1 , wherein training the neural network of the speech recognition engine using the input speech to recognize features included in the input speech comprises training the neural network to recognize static features and dynamic features. 8. The method of claim 7 , wherein: training the neural network to recognize static features comprises training the neural network to recognize mel features; and training the neural network to recognize dynamic features comprises training the neural network to recognize delta features and/or double-delta features. 9. The method of claim 1 , wherein training the filter bank of the neural network using the input speech comprises training a filter bank using frequency pooling. 10. The method of claim 9 , wherein training the filter bank using frequency pooling comprises training a filter bank that includes a plurality of filters, wherein each filter is associated with a frequency band centered at a center frequency identified by vocal-tract-length-normalization filters. 11. The method of claim 1 , wherein training the filter bank of the neural network using the input speech comprises training a filter bank having a plurality of filters initialized as Gaussian filters. 12. The method of claim 1 , wherein training the filter bank of the neural network using the input speech comprises training a filter bank having a plurality of filters initialized as mel filters. 13. The method of claim 1 , wherein: the method further comprises: applying vocal tract length normalization to the power spectrum to produce a warped power spectrum; and jointly training the filter bank and the classifier of the neural network using the input speech comprises jointly training the filter bank and the classifier of the neural network based at least in part on the warped power spectrum. 14. The method of claim 1 , wherein training the classifier of the neural network to classify the input speech based on features comprises, for each set of localized features identified by the classifier, concatenating an i-vector with the set of localized features. 15. The method of claim 14 , wherein training the classifier of the neural network to classify the input speech based on features comprises, for each time-frequency patch, concatenating an i-vector with the time-frequency patch. 16. The method of claim 1 , wherein: the neural network comprises a first layer, a second layer, and at least one third layer; the first layer comprises the filter bank; the second layer is arranged as a deep neural network; the at least one third layer is arranged as a convolutional neural network; and training the classifier of the neural network to classify the input speech based on features comprises training the deep neural network using the input speech and at least one i-vector. 17. The method of claim 1 , wherein jointly training the filter bank and the classifier of the neural network of the speech recognition engine comprises jointly training a filter bank and classifier of a deep neural network of the speech recognition engine. 18. The method of claim 17 , wherein jointly training a filter bank and classifier of a deep neural network comprises jointly training a filter bank and classifier of a convolutional neural network of the speech recognition engine. 19. At least one non-transitory computer-readable storage medium having encoded thereon executable instructions that, when executed by at least one processor, cause the at least one processor to carry out a method, the method comprising: training an acoustic model of a speech recognition engine using input speech to identify features from the input speech and to classify the input speech based on identified features, wherein training the acoustic model comprises: training a neural network of the speech recognition engine using the input speech to recognize features from the input speech, wherein training the neural network to recognize the features comprises training a filter bank of the neural network using the input speech

Assignees

Nuance Communications Inc

Inventors

Classifications

G10L25/18
the extracted parameters being spectral information of each sub-band · CPC title
G10L15/02
Feature extraction for speech recognition; Selection of recognition unit · CPC title
G10L15/16Primary
using artificial neural networks · CPC title
G10L15/063
Training · CPC title

Patent family

Related publications grouped by family.

View patent family 53271806

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10360901B2 cover?: Techniques for learning front-end speech recognition parameters as part of training a neural network classifier include obtaining an input speech signal, and applying front-end speech recognition parameters to extract features from the input speech signal. The extracted features may be fed through a neural network to obtain an output classification for the input speech signal, and an error meas…
Who is the assignee on this patent?: Nuance Communications Inc
What technology area does this patent fall under?: Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 23 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).