Facial animation using emotions for conversational ai systems and applications
US-2024412440-A1 · Dec 12, 2024 · US
US9922664B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9922664-B2 |
| Application number | US-201615082349-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 28, 2016 |
| Priority date | Mar 28, 2016 |
| Publication date | Mar 20, 2018 |
| Grant date | Mar 20, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system for and method of characterizing a target application acoustic domain analyzes one or more speech data samples from the target application acoustic domain to determine one or more target acoustic characteristics, including a CODEC type and bit-rate associated with the speech data samples. The determined target acoustic characteristics may also include other aspects of the target speech data samples such as sampling frequency, active bandwidth, noise level, reverberation level, clipping level, and speaking rate. The determined target acoustic characteristics are stored in a memory as a target acoustic data profile. The data profile may be used to select and/or modify one or more out of domain speech samples based on the one or more target acoustic characteristics.
Opening claim text (preview).
What is claimed is: 1. A system for determining audio and acoustic characteristics of an Application Target Domain (ATD), comprising: a processor; and a memory with computer code instructions stored thereon, the memory operatively coupled to the processor such that the computer code instructions cause the processor to: determine a distribution of at least one audio characteristic and acoustic characteristic associated with one or more speech data samples from the target ATD by analyzing each of the one or more speech samples from the target ATD using only observed speech data samples without using a reference signal, the at least one audio characteristic and acoustic characteristic comprising: (a) CODEC type; (b) bit-rate associated with the one or more speech data samples; (c) sampling frequency associated with the speech data samples; (d) active bandwidth associated with the speech data samples; (e) noise level associated with the speech data samples; (f) reverberation level associated with the speech data samples; (g) clipping level associated with the speech data samples; (f) speaking rate associated with the speech data samples; and store in the memory, as a target data profile, the one or more target audio and acoustic characteristics; extract a feature set from the one or more speech data samples; one or both of: reduce the feature dimension of the feature set with a Classification and Regression Tree (CART) based feature extraction analysis to produce a final feature set; or train a Deep Neural Network (DNN) classifier with the final feature set or a previously-determined feature set; and one or both of: apply the trained DNN to perform a CODEC type classification of one or more of the one or more speech data samples to produce a CODEC type associated with the one or more speech data samples; or apply the trained DNN to perform a bit-rate classification of one or more of the one or more speech data samples and bit-rate associated with the one or more speech data samples. 2. The system of claim 1 , wherein the computer code instructions further cause the processor to use the target data profile to improve the accuracy of automatic speech recognition operating on the speech data samples from the ATD when Out-Of-Domain (OOD) speech data samples from any domain other than the ATD is used to train or adapt the automatic speech recognition. 3. The system of claim 1 , wherein the computer code instructions further cause the processor to pre-process the speech data samples, prior to determining the one or more target audio and acoustic characteristics, to perform one or more of (i) selection of a target language associated with the speech data samples and (ii) remove any of the speech data samples that do not represent recognizable speech. 4. The system of claim 1 , wherein the feature set includes one or more of (i) Linear Prediction Coding (LPC) coefficients, (ii) line spectral frequencies, (iii) Mel-Frequency Cepstrum (MFC) coefficients, (iv) velocity features, (v) acceleration features, (vi) Hilbert Transform-based features, (vii) statistics associated with one or more of the LPC coefficients, line spectral frequencies, MFC coefficients, velocity features, acceleration features, and Hilbert Transform-based features, and (xi) long-term spectral deviation from an Average Speech Spectrum (LTASS). 5. The system of claim 1 , wherein the DNN classifier includes a plurality of nodes connected between an input layer and an output layer, each connection between the nodes being scaled by a coefficient, the nodes being modeled with a non-linear activation function. 6. The system of claim 1 , wherein the computer code instructions further cause the processor to: analyze one or more Out-Of-Domain (OOD) speech data samples to determine an OOD data profile associated with the OOD speech data samples; compare the target data profile to the OOD data profile; and based on the comparing, select one or more of the OOD speech data samples as being similar to the speech data samples from the ATD. 7. The system of claim 6 , wherein the Out-Of-Domain (OOD) speech data samples being similar to the speech data samples from the ATD requires, for each audio and acoustic characteristic of the target and OOD data profiles, a difference between (i) a value of the audio and acoustic characteristic associated with the ATD speech data samples and (ii) a value of the audio and acoustic characteristic associated with the OOD speech data samples being within a predetermined range. 8. A system for determining audio and acoustic characteristics of an Application Target Domain (ATD), comprising: a processor; and a memory with computer code instructions stored thereon, the memory operatively coupled to the processor such that the computer code instructions cause the processor to: determine a distribution of at least one audio characteristic and acoustic characteristic associated with one or more speech data samples from the target ATD by analyzing each of the one or more speech samples from the target ATD according to a non-intrusive technique, the at least one audio characteristic and acoustic characteristic comprising: (a) CODEC type; (b) bit-rate associated with the one or more speech data samples; (c) sampling frequency associated with the speech data samples; (d) active bandwidth associated with the speech data samples; (e) noise level associated with the speech data samples; (f) reverberation level associated with the speech data samples; (g) clipping level associated with the speech data samples; (f) speaking rate associated with the speech data samples; and store in the memory, as a target data profile, the one or more target audio and acoustic characteristics; the system further including a speech corruption toolkit configured to modify one or more Out-Of-Domain (OOD) speech data samples based on the one or more audio and acoustic characteristics of the ATD speech data samples in a manner that reduces a mismatch between the OOD speech data samples and the ATD speech data samples, the speech corruption toolkit being configured to implement one or more of: (i) a speech channel simulator configured to modify the OOD speech samples based on one or both of the determined sampling frequency and the determined reverberation level; (ii) a noise channel simulator configured to modify the OOD speech samples based on the determined noise level; (iii) a microphone simulator configured to modify the OOD speech samples based on the determined active bandwidth; (iv) an amplifier simulator configured to modify the OOD speech samples based on the determined clipping level; and (v) a transmission channel simulator configured to modify the OOD speech samples based on one or both of the determined CODEC type and bit-rate associated with the one or more speech data samples. 9. A method of characterizing a target application acoustic domain (ATD), comprising: by a processor operatively coupled to a memory: determining a distribution of at least one audio characteristic and acoustic characteristic associated with one or more speech data samples from the target ATD by analyzing each of the one or more speech samples from the target ATD according to a non-intrusive technique, the at least one audio characteristic and acoustic characteristic comprising one or more of: (a) CODEC type; (b) bit-rate associated with the one or more speech data samples; (c) sampling frequency associated with the speech data samples; (d) active bandwidth associated with the speech data samples; (e) noise level associated with the speech data samples; (f) reverberation level associated with the speech data samples; (g) clipping level associated wi
Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title
Creating reference templates; Clustering · CPC title
Speech coding using phonetic or linguistical decoding of the source; Reconstruction using text-to-speech synthesis · CPC title
using neural networks · CPC title
Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.