Single-sided speech quality measurement

US9786300B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9786300-B2
Application numberUS-201113195338-A
CountryUS
Kind codeB2
Filing dateAug 1, 2011
Priority dateFeb 28, 2006
Publication dateOct 10, 2017
Grant dateOct 10, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A non-intrusive speech quality estimation technique is based on statistical or probability models such as Gaussian Mixture Models (“GMMs”). Perceptual features are extracted from the received speech signal and assessed by an artificial reference model formed using statistical models. The models characterize the statistical behavior of speech features. Consistency measures between the input speech features and the models are calculated to form indicators of speech quality. The consistency values are mapped to a speech quality score using a mapping optimized using machine learning algorithms, such as Multivariate Adaptive Regression Splines (“MARS”). The technique provides competitive or better quality estimates relative to known techniques while having lower computational complexity.

First claim

Opening claim text (preview).

What is claimed is: 1. A single-ended speech quality measurement method comprising the steps of: for each frame of a plurality of frames containing a speech signal that has been processed by network equipment, transmitted on a communications link, or both: extracting perceptual features; and classifying the frame based on the perceptual features into a class selected from a set of classes including voiced and unvoiced; and for the frames of each class: assessing the perceptual features with a statistical model of that class to generate an indicator of speech quality, the statistical model of that class being part of a reference model which includes at least one statistical model for each class of the set of classes, the reference model generated prior to extracting the perceptual features to form indicators of speech quality, including assessing at least some unvoiced frames; and employing the indicators of speech quality from different classes to produce an estimate of subjective speech quality score without reference to a corresponding speech signal that has not been processed by network equipment, transmitted on a communications link, or both. 2. The method of claim 1 including the further step of separately modeling a probability distribution of the features for each frame class and different classes of speech signals with statistical models. 3. The method of claim 2 wherein the classes include inactive. 4. The method of claim 2 including the further step of calculating a consistency measure indicative of speech quality for each class separately with a plurality of statistical models. 5. The method of claim 4 including the further step of employing the consistency measures to obtain an estimate of subjective scores. 6. The method of claim 5 including the further step of mapping the consistency measures to a speech quality score using a mapping comprising Multivariate Adaptive Regression Splines. 7. The method of claim 1 wherein the perceptual features are assessed with Gaussian Mixture Models to form indicators of speech quality. 8. Apparatus operable to provide a single-end speech quality Measurement, comprising: a feature extraction module which extracts, frame-by-frame, perceptual features from a received speech signal that has been processed by network equipment, transmitted on a communications link, or both; a time segmentation module which classifies each frame based on the perceptual features into a class selected from a set of classes including voiced and unvoiced; a statistical reference model generated prior to extraction of the perceptual features, the reference model including at least one statistical model for each class of the set of classes; a consistency calculation module which, for the frames of each class, operates in response to output from the feature extraction module to assess the perceptual features with a statistical model of that class to form indicators of subjective speech quality without reference to a corresponding speech signal that has not been processed by network equipment, transmitted on a communications link, or both, including assessing at least some unvoiced frames; and a scoring module which employs the indicators of speech quality from different classes to produce a speech quality score without reference to a corresponding speech signal that has not been processed by network equipment, transmitted on a communications link, or both. 9. The apparatus of claim 8 wherein the consistency calculation module is further operable to separately model a probability distribution of the features for each class and different classes of speech signals with the statistical models. 10. The Apparatus of claim 9 wherein the classes include inactive. 11. The apparatus of claim 9 wherein the consistency calculation module is further operable to calculate a consistency measure indicative of speech quality for each class separately with a plurality of Gaussian Mixture Models. 12. The apparatus of claim 11 further including a mapping module operable to employ the consistency measures to obtain an estimate of subjective scores. 13. The apparatus of claim 12 wherein the mapping module employs a mapping optimized using Multivariate Adaptive Regression Splines. 14. The apparatus of claim 8 wherein the statistical reference model includes Gaussian Mixture Models.

Assignees

Inventors

Classifications

  • G10L25/69Primary

    for evaluating synthetic or decoded voice signals · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9786300B2 cover?
A non-intrusive speech quality estimation technique is based on statistical or probability models such as Gaussian Mixture Models (“GMMs”). Perceptual features are extracted from the received speech signal and assessed by an artificial reference model formed using statistical models. The models characterize the statistical behavior of speech features. Consistency measures between the input spee…
Who is the assignee on this patent?
Chan Wai-Yip, Falk Tiago H, Xu Qingfeng, and 1 more
What technology area does this patent fall under?
Primary CPC classification G10L25/69. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 10 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).