Systems and methods for speech recognition in unseen and noisy channel conditions

US11217228B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11217228-B2
Application numberUS-201716085262-A
CountryUS
Kind codeB2
Filing dateMar 22, 2017
Priority dateMar 22, 2016
Publication dateJan 4, 2022
Grant dateJan 4, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for speech recognition are provided. In some aspects, the method comprises receiving, using an input, an audio signal. The method further comprises splitting the audio signal into auditory test segments. The method further comprises extracting, from each of the auditory test segments, a set of acoustic features. The method further comprises applying the set of acoustic features to a deep neural network to produce a hypothesis for the corresponding auditory test segment. The method further comprises selectively performing one or more of: indirect adaptation of the deep neural network and direct adaptation of the deep neural network.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for speech recognition comprising: receiving, using an input, an audio signal; splitting the audio signal into auditory test segments; extracting, from each of the auditory test segments, a set of acoustic features; applying the set of acoustic features to a deep neural network to produce a hypothesis for the corresponding auditory test segment; and selectively performing one or more of: indirect adaptation of the deep neural network and direct adaptation of the deep neural network. 2. The method of claim 1 , wherein performing indirect adaptation of the deep neural network comprises: extracting, from each of the auditory test segments, two distinct sets of acoustic features; and applying the two distinct sets of acoustic features to the deep neural network simultaneously. 3. The method of claim 2 , further comprising: performing a feature-space transformation on each of the two distinct sets of acoustic features prior to applying the two distinct sets of acoustic features to the deep neural network simultaneously. 4. The method of claim 3 , wherein the feature-space transformation is a feature space maximum likelihood linear regression transformation. 5. The method of claim 1 , wherein the set of acoustic features comprises a set of feature vectors, each of the set of feature vectors comprising quantitative measures of acoustic properties of the corresponding auditory test segment. 6. The method of claim 5 , wherein the quantitative measures of acoustic properties comprise at least one of gammatone filterbank energies, normalized modulation coefficients, mel-filterbank energies, and mel-frequency cepstral coefficients. 7. The method of claim 1 , wherein the deep neural network is pre-trained using transcribed audio signals. 8. The method of claim 7 , further comprising: applying the set of acoustic features to a deep autoencoder to produce (i) a set of deep autoencoder bottleneck features, and (ii) a set of recovered acoustic features based on an inverse operation by the deep autoencoder on the set of deep autoencoder bottleneck features. 9. The method of claim 8 , wherein the set of deep autoencoder bottleneck features is used to extract an entropy-based confidence measure for the corresponding auditory test segment. 10. The method of claim 9 , wherein performing direct adaptation of the deep neural network comprises: selecting, using the entropy-based confidence measure, the auditory test segments having percentile cumulative entropies below a threshold percentile cumulative entropy; and retraining the deep neural network using the selected auditory test segments. 11. The method of claim 8 , wherein the deep autoencoder is pre-trained with transcribed audio signals using mean squared error backpropagation. 12. The method of claim 1 , wherein the deep neural network is one of a convolutional neural network, a time-convolutional neural network, and a time-frequency convolutional neural network. 13. A speech recognition system comprising: an input configured to receive an audio signal; a processor and a memory having instructions executable by the processor, causing the processor to: receive, using the input, the audio signal; split the audio signal into auditory test segments; extract, from each of the auditory test segments, a set of acoustic features; apply the set of acoustic features to a deep neural network to produce a hypothesis for the corresponding auditory test segment; selectively perform one or more of: indirect adaptation of the deep neural network and direct adaptation of the deep neural network; and an output configured to transmit the hypothesis. 14. The speech recognition system of claim 13 , wherein the deep neural network is pre-trained using transcribed audio signals. 15. The speech recognition system of claim 14 , wherein, when performing indirect adaptation of the deep neural network, the processor is configured to: extract, from each of the auditory test segments, two distinct sets of acoustic features; and apply the two distinct sets of acoustic features to the deep neural network simultaneously. 16. The speech recognition system of claim 15 , wherein the processor is further configured to: perform a feature-space transformation on each of the two distinct sets of acoustic features prior to applying the two distinct sets of acoustic features to the deep neural network simultaneously. 17. The speech recognition system of claim 16 , wherein the feature-space transformation is a feature space maximum likelihood linear regression transformation. 18. The speech recognition system of claim 14 , wherein the processor is further configured to: apply the set of acoustic features to a deep autoencoder to produce (i) a set of deep autoencoder bottleneck features, and (ii) a set of recovered acoustic features based on an inverse operation by the deep autoencoder on the set of deep autoencoder bottleneck features. 19. The speech recognition system of claim 18 , wherein the set of deep autoencoder bottleneck features is used by the processor to extract an entropy-based confidence measure for the corresponding auditory test segment. 20. The speech recognition system of claim 19 , wherein performing direct adaptation of the deep neural network comprises: selecting, using the entropy-based confidence measure, the auditory test segments with percentile cumulative entropies below a threshold percentile cumulative entropy; and retraining the deep neural network using the selected auditory test segments.

Assignees

Inventors

Classifications

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • the extracted parameters being the cepstrum · CPC title

  • Feature extraction for speech recognition; Selection of recognition unit · CPC title

  • G10L15/16Primary

    using artificial neural networks · CPC title

  • G10L15/075Primary

    supervised, i.e. under machine guidance · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11217228B2 cover?
Systems and methods for speech recognition are provided. In some aspects, the method comprises receiving, using an input, an audio signal. The method further comprises splitting the audio signal into auditory test segments. The method further comprises extracting, from each of the auditory test segments, a set of acoustic features. The method further comprises applying the set of acoustic featu…
Who is the assignee on this patent?
Stanford Res Inst Int
What technology area does this patent fall under?
Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 04 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).