What technology area does this patent fall under?

Primary CPC classification H04M3/323. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Jan 23 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Conversational call quality evaluator

US9876901B1 · US · B1

Patent metadata
Field	Value
Publication number	US-9876901-B1
Application number	US-201615261635-A
Country	US
Kind code	B1
Filing date	Sep 9, 2016
Priority date	Sep 9, 2016
Publication date	Jan 23, 2018
Grant date	Jan 23, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Aspects of the disclosure simulate a conversation over a real-time communication system and a reference script shared between the communication devices serves as a basis for comparison against the received speech-recognized conversation. A method of evaluating call quality of a real-time communication system that includes at least two communication devices is disclosed that includes receiving a reference script, the reference script containing linguistic contents of an audio signal being sent to one of the communication devices; generating an evaluation transcript by applying speech recognition to the audio signal being received; comparing the reference script with the evaluation transcript; and generating a call quality metric of the real-time communication system based on the comparison. The call quality metric may also include a communication delay which may be evaluated by determining a duration of a speaking turn; determining a duration of a listening turn from the audio signal being received by one of the communication devices; and estimating a communication delay of the audio signal based on the duration of the speaking turn and the listening turn.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method to evaluate audio quality of a real-time communication system that includes at least two communication devices, comprising: receiving a reference script, the reference script containing linguistic contents in text form of an audio signal sent to at least one of the communication devices, wherein the reference script excludes audio; receiving the audio signal; generating an evaluation transcript in text form by applying speech recognition to the audio signal; comparing the reference script with the evaluation transcript; generating a call quality metric of the real-time communication system based on the comparison; determining a duration of a speaking turn; determining a duration of a listening turn from the audio signal being received by one of the communication devices; and estimating a communication delay of the audio signal based on the duration of the speaking turn and the listening turn. 2. The computer-implemented method of claim 1 , said determining a duration of the speaking turn including inputting a duration value. 3. The computer-implemented method of claim 1 , said determining a duration of the speaking turn including: applying a voice activity detection process to the audio signal being sent from one of the communication devices. 4. The computer-implemented method of claim 1 , said determining a duration of the listening turn including: activating a voice activity detection process based on the duration of the speaking turn; and determining when the received speech in the listening turn ends using the voice activity detection process, wherein determining the duration of the listening turn is based on determining when the listening turn ends. 5. The computer-implemented method of claim 4 , said determining a duration of the listening turn further including: receiving a synchronization signal indicating a start of the received speech, wherein determining the duration of the listening turn is based on receipt timing of the synchronization signal and on when the listening turn ends. 6. A computer-implemented method of evaluating audio quality of a real-time communication system using a first communication and a second communication device communicating via a real-time communication system, the first and second communication devices implementing a method, comprising: receiving, at the first and second communication devices, a turn-based reference script containing linguistic contents of turn-based speech including respective speaking turns for the first and second communication devices; generating, at the first and second communication devices during each of their respective speaking turns, audible speech; receiving audio signals, at the second and first communication devices during each of their respective listening turns, the audio signals including the produced speech from respectively corresponding speaking turns of the first and second communication devices; generating, at the first and second communication devices during each of their respective listening turns, evaluation transcripts by applying speech recognition to the produced speech in the received audio signal; comparing, at the first and second communication devices, the turn-based reference script with the respective evaluation transcript for each of the respectively corresponding listening turns; and generating an overall call quality metric of the real-time communication system based on the comparisons. 7. The computer-implemented method of claim 6 , further comprising: receiving, at the first and second communication devices, expected durations of the speech in the audio signal for each of the speaking turns of the respectively corresponding second and first communication devices; enabling, at the first and second communication devices during their respective listening turns, voice activity detection according to the expected duration of the speech in the audio signal; detecting, at the first and second communication devices during their respective listening turns, when the speech being received for each listening turn ends using the voice activity detection; and determining, at the first and second communication devices during their respective listening turns, communication delays for each listening turn based on the expected durations and the determination of when the speech being received for each listening turn ends. 8. The computer-implemented method of claim 7 , further comprising: receiving, at the first and second communication devices during their respective listening turns, a synchronization signal for each listening turn indicating a start of the received speech, wherein the step of determining the communication delays determines the communication delays based on receipt timings of the synchronization signals and the determination of when the speech being received for each listening turn ends. 9. The computer-implemented method according to claim 6 , wherein the audio signal containing speech is generated by a text-to-speech process based on the turn-based reference script. 10. The computer-implemented method according to claim 6 , wherein the audio signal containing speech is generated by playing back a speech recording of each speaking turn and the evaluation transcript is generated using speech recognition. 11. The computer-implemented method according to claim 6 , said generating an overall call quality metric of the real-time communication system based on the comparisons including: for each listening turn (i) of the first and second communication device: aligning the evaluation transcript of the listening turn with a corresponding speaking turn of the turn-based reference script, and determining a number of corrections C i , deletions D i , insertions, I i , and substitutions S i for each listening turn (i) by comparing the aligned evaluation transcript of the listening turn with the corresponding speaking turn of the turn-based reference script, individually summing C i , D i , I i , and substitutions S i over all the listening turns (i) to respectively calculate and C, D, I and S; and calculating an overall word error rate (WER) based on WER=(D+I+S)/(S+D+C). 12. An apparatus to evaluate audio quality of a real-time communication system that includes at least two communication devices, comprising: a processor and a non-transitory storage device storing instructions that are operable, when executed by the processor, to cause the processor to perform operations including: receiving a reference script, the reference script containing linguistic contents in text form, of an audio signal sent to at least one of the communication devices, wherein the reference script excludes audio; receiving the audio signal; generating an evaluation transcript in text form by applying speech recognition to the audio signal; comparing the reference script with the evaluation transcript; generating a call quality metric of the real-time communication system based on the comparison; determining a duration of a speaking turn; determining a duration of a listening turn from the audio signal being received by one of the communication devices; and estimating a communication delay of the audio signal based on the duration of the speaking turn and the listening turn. 13. The apparatus of claim 12 , said determining a duration of the speaking turn including inputting a duration value. 14. The apparatus of claim 12 , said determining a duration of the speaking turn including: applying a voice activity detection process to the audio signal being sent from one

Assignees

Google Inc

Inventors

Classifications

G10L25/78
Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title
G10L15/26
Speech to text systems (G10L15/08 takes precedence) · CPC title
H04M3/323Primary
for the arrangements providing the connection (test connection, test call, call simulation) · CPC title
H04M2201/40
using speech recognition · CPC title
H04M3/2236Primary
Quality of speech transmission monitoring · CPC title

Patent family

Related publications grouped by family.

View patent family 60956932

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9876901B1 cover?: Aspects of the disclosure simulate a conversation over a real-time communication system and a reference script shared between the communication devices serves as a basis for comparison against the received speech-recognized conversation. A method of evaluating call quality of a real-time communication system that includes at least two communication devices is disclosed that includes receiving a…
Who is the assignee on this patent?: Google Inc
What technology area does this patent fall under?: Primary CPC classification H04M3/323. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Jan 23 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).