Enhanced audio frame loss concealment
US-2015371641-A1 · Dec 24, 2015 · US
US9786300B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9786300-B2 |
| Application number | US-201113195338-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 1, 2011 |
| Priority date | Feb 28, 2006 |
| Publication date | Oct 10, 2017 |
| Grant date | Oct 10, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A non-intrusive speech quality estimation technique is based on statistical or probability models such as Gaussian Mixture Models (“GMMs”). Perceptual features are extracted from the received speech signal and assessed by an artificial reference model formed using statistical models. The models characterize the statistical behavior of speech features. Consistency measures between the input speech features and the models are calculated to form indicators of speech quality. The consistency values are mapped to a speech quality score using a mapping optimized using machine learning algorithms, such as Multivariate Adaptive Regression Splines (“MARS”). The technique provides competitive or better quality estimates relative to known techniques while having lower computational complexity.
Opening claim text (preview).
What is claimed is: 1. A single-ended speech quality measurement method comprising the steps of: for each frame of a plurality of frames containing a speech signal that has been processed by network equipment, transmitted on a communications link, or both: extracting perceptual features; and classifying the frame based on the perceptual features into a class selected from a set of classes including voiced and unvoiced; and for the frames of each class: assessing the perceptual features with a statistical model of that class to generate an indicator of speech quality, the statistical model of that class being part of a reference model which includes at least one statistical model for each class of the set of classes, the reference model generated prior to extracting the perceptual features to form indicators of speech quality, including assessing at least some unvoiced frames; and employing the indicators of speech quality from different classes to produce an estimate of subjective speech quality score without reference to a corresponding speech signal that has not been processed by network equipment, transmitted on a communications link, or both. 2. The method of claim 1 including the further step of separately modeling a probability distribution of the features for each frame class and different classes of speech signals with statistical models. 3. The method of claim 2 wherein the classes include inactive. 4. The method of claim 2 including the further step of calculating a consistency measure indicative of speech quality for each class separately with a plurality of statistical models. 5. The method of claim 4 including the further step of employing the consistency measures to obtain an estimate of subjective scores. 6. The method of claim 5 including the further step of mapping the consistency measures to a speech quality score using a mapping comprising Multivariate Adaptive Regression Splines. 7. The method of claim 1 wherein the perceptual features are assessed with Gaussian Mixture Models to form indicators of speech quality. 8. Apparatus operable to provide a single-end speech quality Measurement, comprising: a feature extraction module which extracts, frame-by-frame, perceptual features from a received speech signal that has been processed by network equipment, transmitted on a communications link, or both; a time segmentation module which classifies each frame based on the perceptual features into a class selected from a set of classes including voiced and unvoiced; a statistical reference model generated prior to extraction of the perceptual features, the reference model including at least one statistical model for each class of the set of classes; a consistency calculation module which, for the frames of each class, operates in response to output from the feature extraction module to assess the perceptual features with a statistical model of that class to form indicators of subjective speech quality without reference to a corresponding speech signal that has not been processed by network equipment, transmitted on a communications link, or both, including assessing at least some unvoiced frames; and a scoring module which employs the indicators of speech quality from different classes to produce a speech quality score without reference to a corresponding speech signal that has not been processed by network equipment, transmitted on a communications link, or both. 9. The apparatus of claim 8 wherein the consistency calculation module is further operable to separately model a probability distribution of the features for each class and different classes of speech signals with the statistical models. 10. The Apparatus of claim 9 wherein the classes include inactive. 11. The apparatus of claim 9 wherein the consistency calculation module is further operable to calculate a consistency measure indicative of speech quality for each class separately with a plurality of Gaussian Mixture Models. 12. The apparatus of claim 11 further including a mapping module operable to employ the consistency measures to obtain an estimate of subjective scores. 13. The apparatus of claim 12 wherein the mapping module employs a mapping optimized using Multivariate Adaptive Regression Splines. 14. The apparatus of claim 8 wherein the statistical reference model includes Gaussian Mixture Models.
for evaluating synthetic or decoded voice signals · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.