System and method for automated voice quality testing
US-2019349473-A1 · Nov 14, 2019 · US
US9876901B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9876901-B1 |
| Application number | US-201615261635-A |
| Country | US |
| Kind code | B1 |
| Filing date | Sep 9, 2016 |
| Priority date | Sep 9, 2016 |
| Publication date | Jan 23, 2018 |
| Grant date | Jan 23, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Aspects of the disclosure simulate a conversation over a real-time communication system and a reference script shared between the communication devices serves as a basis for comparison against the received speech-recognized conversation. A method of evaluating call quality of a real-time communication system that includes at least two communication devices is disclosed that includes receiving a reference script, the reference script containing linguistic contents of an audio signal being sent to one of the communication devices; generating an evaluation transcript by applying speech recognition to the audio signal being received; comparing the reference script with the evaluation transcript; and generating a call quality metric of the real-time communication system based on the comparison. The call quality metric may also include a communication delay which may be evaluated by determining a duration of a speaking turn; determining a duration of a listening turn from the audio signal being received by one of the communication devices; and estimating a communication delay of the audio signal based on the duration of the speaking turn and the listening turn.
Opening claim text (preview).
The invention claimed is: 1. A computer-implemented method to evaluate audio quality of a real-time communication system that includes at least two communication devices, comprising: receiving a reference script, the reference script containing linguistic contents in text form of an audio signal sent to at least one of the communication devices, wherein the reference script excludes audio; receiving the audio signal; generating an evaluation transcript in text form by applying speech recognition to the audio signal; comparing the reference script with the evaluation transcript; generating a call quality metric of the real-time communication system based on the comparison; determining a duration of a speaking turn; determining a duration of a listening turn from the audio signal being received by one of the communication devices; and estimating a communication delay of the audio signal based on the duration of the speaking turn and the listening turn. 2. The computer-implemented method of claim 1 , said determining a duration of the speaking turn including inputting a duration value. 3. The computer-implemented method of claim 1 , said determining a duration of the speaking turn including: applying a voice activity detection process to the audio signal being sent from one of the communication devices. 4. The computer-implemented method of claim 1 , said determining a duration of the listening turn including: activating a voice activity detection process based on the duration of the speaking turn; and determining when the received speech in the listening turn ends using the voice activity detection process, wherein determining the duration of the listening turn is based on determining when the listening turn ends. 5. The computer-implemented method of claim 4 , said determining a duration of the listening turn further including: receiving a synchronization signal indicating a start of the received speech, wherein determining the duration of the listening turn is based on receipt timing of the synchronization signal and on when the listening turn ends. 6. A computer-implemented method of evaluating audio quality of a real-time communication system using a first communication and a second communication device communicating via a real-time communication system, the first and second communication devices implementing a method, comprising: receiving, at the first and second communication devices, a turn-based reference script containing linguistic contents of turn-based speech including respective speaking turns for the first and second communication devices; generating, at the first and second communication devices during each of their respective speaking turns, audible speech; receiving audio signals, at the second and first communication devices during each of their respective listening turns, the audio signals including the produced speech from respectively corresponding speaking turns of the first and second communication devices; generating, at the first and second communication devices during each of their respective listening turns, evaluation transcripts by applying speech recognition to the produced speech in the received audio signal; comparing, at the first and second communication devices, the turn-based reference script with the respective evaluation transcript for each of the respectively corresponding listening turns; and generating an overall call quality metric of the real-time communication system based on the comparisons. 7. The computer-implemented method of claim 6 , further comprising: receiving, at the first and second communication devices, expected durations of the speech in the audio signal for each of the speaking turns of the respectively corresponding second and first communication devices; enabling, at the first and second communication devices during their respective listening turns, voice activity detection according to the expected duration of the speech in the audio signal; detecting, at the first and second communication devices during their respective listening turns, when the speech being received for each listening turn ends using the voice activity detection; and determining, at the first and second communication devices during their respective listening turns, communication delays for each listening turn based on the expected durations and the determination of when the speech being received for each listening turn ends. 8. The computer-implemented method of claim 7 , further comprising: receiving, at the first and second communication devices during their respective listening turns, a synchronization signal for each listening turn indicating a start of the received speech, wherein the step of determining the communication delays determines the communication delays based on receipt timings of the synchronization signals and the determination of when the speech being received for each listening turn ends. 9. The computer-implemented method according to claim 6 , wherein the audio signal containing speech is generated by a text-to-speech process based on the turn-based reference script. 10. The computer-implemented method according to claim 6 , wherein the audio signal containing speech is generated by playing back a speech recording of each speaking turn and the evaluation transcript is generated using speech recognition. 11. The computer-implemented method according to claim 6 , said generating an overall call quality metric of the real-time communication system based on the comparisons including: for each listening turn (i) of the first and second communication device: aligning the evaluation transcript of the listening turn with a corresponding speaking turn of the turn-based reference script, and determining a number of corrections C i , deletions D i , insertions, I i , and substitutions S i for each listening turn (i) by comparing the aligned evaluation transcript of the listening turn with the corresponding speaking turn of the turn-based reference script, individually summing C i , D i , I i , and substitutions S i over all the listening turns (i) to respectively calculate and C, D, I and S; and calculating an overall word error rate (WER) based on WER=(D+I+S)/(S+D+C). 12. An apparatus to evaluate audio quality of a real-time communication system that includes at least two communication devices, comprising: a processor and a non-transitory storage device storing instructions that are operable, when executed by the processor, to cause the processor to perform operations including: receiving a reference script, the reference script containing linguistic contents in text form, of an audio signal sent to at least one of the communication devices, wherein the reference script excludes audio; receiving the audio signal; generating an evaluation transcript in text form by applying speech recognition to the audio signal; comparing the reference script with the evaluation transcript; generating a call quality metric of the real-time communication system based on the comparison; determining a duration of a speaking turn; determining a duration of a listening turn from the audio signal being received by one of the communication devices; and estimating a communication delay of the audio signal based on the duration of the speaking turn and the listening turn. 13. The apparatus of claim 12 , said determining a duration of the speaking turn including inputting a duration value. 14. The apparatus of claim 12 , said determining a duration of the speaking turn including: applying a voice activity detection process to the audio signal being sent from one
Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
for the arrangements providing the connection (test connection, test call, call simulation) · CPC title
using speech recognition · CPC title
Quality of speech transmission monitoring · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.