Characterizing, selecting and adapting audio and acoustic training data for automatic speech recognition systems

US9922664B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9922664-B2
Application numberUS-201615082349-A
CountryUS
Kind codeB2
Filing dateMar 28, 2016
Priority dateMar 28, 2016
Publication dateMar 20, 2018
Grant dateMar 20, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system for and method of characterizing a target application acoustic domain analyzes one or more speech data samples from the target application acoustic domain to determine one or more target acoustic characteristics, including a CODEC type and bit-rate associated with the speech data samples. The determined target acoustic characteristics may also include other aspects of the target speech data samples such as sampling frequency, active bandwidth, noise level, reverberation level, clipping level, and speaking rate. The determined target acoustic characteristics are stored in a memory as a target acoustic data profile. The data profile may be used to select and/or modify one or more out of domain speech samples based on the one or more target acoustic characteristics.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for determining audio and acoustic characteristics of an Application Target Domain (ATD), comprising: a processor; and a memory with computer code instructions stored thereon, the memory operatively coupled to the processor such that the computer code instructions cause the processor to: determine a distribution of at least one audio characteristic and acoustic characteristic associated with one or more speech data samples from the target ATD by analyzing each of the one or more speech samples from the target ATD using only observed speech data samples without using a reference signal, the at least one audio characteristic and acoustic characteristic comprising: (a) CODEC type; (b) bit-rate associated with the one or more speech data samples; (c) sampling frequency associated with the speech data samples; (d) active bandwidth associated with the speech data samples; (e) noise level associated with the speech data samples; (f) reverberation level associated with the speech data samples; (g) clipping level associated with the speech data samples; (f) speaking rate associated with the speech data samples; and store in the memory, as a target data profile, the one or more target audio and acoustic characteristics; extract a feature set from the one or more speech data samples; one or both of: reduce the feature dimension of the feature set with a Classification and Regression Tree (CART) based feature extraction analysis to produce a final feature set; or train a Deep Neural Network (DNN) classifier with the final feature set or a previously-determined feature set; and one or both of: apply the trained DNN to perform a CODEC type classification of one or more of the one or more speech data samples to produce a CODEC type associated with the one or more speech data samples; or apply the trained DNN to perform a bit-rate classification of one or more of the one or more speech data samples and bit-rate associated with the one or more speech data samples. 2. The system of claim 1 , wherein the computer code instructions further cause the processor to use the target data profile to improve the accuracy of automatic speech recognition operating on the speech data samples from the ATD when Out-Of-Domain (OOD) speech data samples from any domain other than the ATD is used to train or adapt the automatic speech recognition. 3. The system of claim 1 , wherein the computer code instructions further cause the processor to pre-process the speech data samples, prior to determining the one or more target audio and acoustic characteristics, to perform one or more of (i) selection of a target language associated with the speech data samples and (ii) remove any of the speech data samples that do not represent recognizable speech. 4. The system of claim 1 , wherein the feature set includes one or more of (i) Linear Prediction Coding (LPC) coefficients, (ii) line spectral frequencies, (iii) Mel-Frequency Cepstrum (MFC) coefficients, (iv) velocity features, (v) acceleration features, (vi) Hilbert Transform-based features, (vii) statistics associated with one or more of the LPC coefficients, line spectral frequencies, MFC coefficients, velocity features, acceleration features, and Hilbert Transform-based features, and (xi) long-term spectral deviation from an Average Speech Spectrum (LTASS). 5. The system of claim 1 , wherein the DNN classifier includes a plurality of nodes connected between an input layer and an output layer, each connection between the nodes being scaled by a coefficient, the nodes being modeled with a non-linear activation function. 6. The system of claim 1 , wherein the computer code instructions further cause the processor to: analyze one or more Out-Of-Domain (OOD) speech data samples to determine an OOD data profile associated with the OOD speech data samples; compare the target data profile to the OOD data profile; and based on the comparing, select one or more of the OOD speech data samples as being similar to the speech data samples from the ATD. 7. The system of claim 6 , wherein the Out-Of-Domain (OOD) speech data samples being similar to the speech data samples from the ATD requires, for each audio and acoustic characteristic of the target and OOD data profiles, a difference between (i) a value of the audio and acoustic characteristic associated with the ATD speech data samples and (ii) a value of the audio and acoustic characteristic associated with the OOD speech data samples being within a predetermined range. 8. A system for determining audio and acoustic characteristics of an Application Target Domain (ATD), comprising: a processor; and a memory with computer code instructions stored thereon, the memory operatively coupled to the processor such that the computer code instructions cause the processor to: determine a distribution of at least one audio characteristic and acoustic characteristic associated with one or more speech data samples from the target ATD by analyzing each of the one or more speech samples from the target ATD according to a non-intrusive technique, the at least one audio characteristic and acoustic characteristic comprising: (a) CODEC type; (b) bit-rate associated with the one or more speech data samples; (c) sampling frequency associated with the speech data samples; (d) active bandwidth associated with the speech data samples; (e) noise level associated with the speech data samples; (f) reverberation level associated with the speech data samples; (g) clipping level associated with the speech data samples; (f) speaking rate associated with the speech data samples; and store in the memory, as a target data profile, the one or more target audio and acoustic characteristics; the system further including a speech corruption toolkit configured to modify one or more Out-Of-Domain (OOD) speech data samples based on the one or more audio and acoustic characteristics of the ATD speech data samples in a manner that reduces a mismatch between the OOD speech data samples and the ATD speech data samples, the speech corruption toolkit being configured to implement one or more of: (i) a speech channel simulator configured to modify the OOD speech samples based on one or both of the determined sampling frequency and the determined reverberation level; (ii) a noise channel simulator configured to modify the OOD speech samples based on the determined noise level; (iii) a microphone simulator configured to modify the OOD speech samples based on the determined active bandwidth; (iv) an amplifier simulator configured to modify the OOD speech samples based on the determined clipping level; and (v) a transmission channel simulator configured to modify the OOD speech samples based on one or both of the determined CODEC type and bit-rate associated with the one or more speech data samples. 9. A method of characterizing a target application acoustic domain (ATD), comprising: by a processor operatively coupled to a memory: determining a distribution of at least one audio characteristic and acoustic characteristic associated with one or more speech data samples from the target ATD by analyzing each of the one or more speech samples from the target ATD according to a non-intrusive technique, the at least one audio characteristic and acoustic characteristic comprising one or more of: (a) CODEC type; (b) bit-rate associated with the one or more speech data samples; (c) sampling frequency associated with the speech data samples; (d) active bandwidth associated with the speech data samples; (e) noise level associated with the speech data samples; (f) reverberation level associated with the speech data samples; (g) clipping level associated wi

Assignees

Inventors

Classifications

  • Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title

  • Creating reference templates; Clustering · CPC title

  • Speech coding using phonetic or linguistical decoding of the source; Reconstruction using text-to-speech synthesis · CPC title

  • G10L25/30Primary

    using neural networks · CPC title

  • Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9922664B2 cover?
A system for and method of characterizing a target application acoustic domain analyzes one or more speech data samples from the target application acoustic domain to determine one or more target acoustic characteristics, including a CODEC type and bit-rate associated with the speech data samples. The determined target acoustic characteristics may also include other aspects of the target speech…
Who is the assignee on this patent?
Nuance Communications Inc
What technology area does this patent fall under?
Primary CPC classification G10L25/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).