User interface customization based on speaker characteristics
US-2015154002-A1 · Jun 4, 2015 · US
US10360901B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10360901-B2 |
| Application number | US-201414561811-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 5, 2014 |
| Priority date | Dec 6, 2013 |
| Publication date | Jul 23, 2019 |
| Grant date | Jul 23, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for learning front-end speech recognition parameters as part of training a neural network classifier include obtaining an input speech signal, and applying front-end speech recognition parameters to extract features from the input speech signal. The extracted features may be fed through a neural network to obtain an output classification for the input speech signal, and an error measure may be computed for the output classification through comparison of the output classification with a known target classification. Back propagation may be applied to adjust one or more of the front-end parameters as one or more layers of the neural network, based on the error measure.
Opening claim text (preview).
What is claimed is: 1. A method comprising: training an acoustic model of a speech recognition engine using input speech to identify features from the input speech and to classify the input speech based on identified features, wherein training the acoustic model comprises: training a neural network of the speech recognition engine using the input speech to recognize features from the input speech, wherein training the neural network to recognize the features comprises training a filter bank of the neural network using the input speech, wherein the filter bank of the neural network is inside the neural network, and wherein the filter bank is configured to take a speech frame power spectrum as input; and jointly with training the filter bank of the neural network, training a classifier of the neural network using the input speech to classify the input speech based on features generated using output of the filter bank, wherein jointly training the filter bank of the neural network and the classifier of the neural network comprises: processing a frame of the input speech to produce a power spectrum for the frame of the input speech; and passing the power spectrum through the filter bank to create filter bank output. 2. The method of claim 1 , wherein: the neural network comprises at least two layers; and the filter bank is at least a part of one of the at least two layers. 3. The method of claim 2 , wherein the filter bank comprises a plurality of weights and the plurality of weights form at least a part of a layer of the at least two layers of the neural network, and wherein each weight of the plurality of weights operates on a subset of frequency components of the power spectrum, and wherein jointly training the filter bank of the neural network and the classifier of the neural network comprises: generating a set of features of the frame based on the filter bank output; determining, using the classifier of the neural network, a classification for the frame based on the set of features; computing an error measure by comparing the classification to a target classification for the frame; and training the filter bank and the classifier based on the error measure, wherein training the filter bank comprises adjusting the plurality of weights of the neural network based on the error measure. 4. The method of claim 3 , wherein adjusting the plurality of weights comprises adjusting the plurality of weights using back propagation of the error measure between the classification and the target classification. 5. The method of claim 1 , wherein: processing the frame of the input speech to produce a power spectrum for the frame comprises producing a normalized power spectrum for the frame; and jointly training the filter bank and the classifier of the neural network using the input speech comprises jointly training the filter bank and the classifier of the neural network based at least in part on the normalized power spectrum. 6. The method of claim 5 , wherein processing the frame of the input speech to produce the normalized power spectrum for the frame comprises: processing the frame to produce a power spectrum for the frame; performing a non-linear transformation on the power spectrum to produce a non-linear power spectrum; normalizing the non-linear power spectrum to produce a normalized non-linear power spectrum; and performing a second transformation on the normalized non-linear power spectrum to produce the normalized power spectrum. 7. The method of claim 1 , wherein training the neural network of the speech recognition engine using the input speech to recognize features included in the input speech comprises training the neural network to recognize static features and dynamic features. 8. The method of claim 7 , wherein: training the neural network to recognize static features comprises training the neural network to recognize mel features; and training the neural network to recognize dynamic features comprises training the neural network to recognize delta features and/or double-delta features. 9. The method of claim 1 , wherein training the filter bank of the neural network using the input speech comprises training a filter bank using frequency pooling. 10. The method of claim 9 , wherein training the filter bank using frequency pooling comprises training a filter bank that includes a plurality of filters, wherein each filter is associated with a frequency band centered at a center frequency identified by vocal-tract-length-normalization filters. 11. The method of claim 1 , wherein training the filter bank of the neural network using the input speech comprises training a filter bank having a plurality of filters initialized as Gaussian filters. 12. The method of claim 1 , wherein training the filter bank of the neural network using the input speech comprises training a filter bank having a plurality of filters initialized as mel filters. 13. The method of claim 1 , wherein: the method further comprises: applying vocal tract length normalization to the power spectrum to produce a warped power spectrum; and jointly training the filter bank and the classifier of the neural network using the input speech comprises jointly training the filter bank and the classifier of the neural network based at least in part on the warped power spectrum. 14. The method of claim 1 , wherein training the classifier of the neural network to classify the input speech based on features comprises, for each set of localized features identified by the classifier, concatenating an i-vector with the set of localized features. 15. The method of claim 14 , wherein training the classifier of the neural network to classify the input speech based on features comprises, for each time-frequency patch, concatenating an i-vector with the time-frequency patch. 16. The method of claim 1 , wherein: the neural network comprises a first layer, a second layer, and at least one third layer; the first layer comprises the filter bank; the second layer is arranged as a deep neural network; the at least one third layer is arranged as a convolutional neural network; and training the classifier of the neural network to classify the input speech based on features comprises training the deep neural network using the input speech and at least one i-vector. 17. The method of claim 1 , wherein jointly training the filter bank and the classifier of the neural network of the speech recognition engine comprises jointly training a filter bank and classifier of a deep neural network of the speech recognition engine. 18. The method of claim 17 , wherein jointly training a filter bank and classifier of a deep neural network comprises jointly training a filter bank and classifier of a convolutional neural network of the speech recognition engine. 19. At least one non-transitory computer-readable storage medium having encoded thereon executable instructions that, when executed by at least one processor, cause the at least one processor to carry out a method, the method comprising: training an acoustic model of a speech recognition engine using input speech to identify features from the input speech and to classify the input speech based on identified features, wherein training the acoustic model comprises: training a neural network of the speech recognition engine using the input speech to recognize features from the input speech, wherein training the neural network to recognize the features comprises training a filter bank of the neural network using the input speech
the extracted parameters being spectral information of each sub-band · CPC title
Feature extraction for speech recognition; Selection of recognition unit · CPC title
using artificial neural networks · CPC title
Training · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.