Voice communication method and apparatus and method and apparatus for operating jitter buffer
US-2015030017-A1 · Jan 29, 2015 · US
US9525845B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9525845-B2 |
| Application number | US-201314426134-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 27, 2013 |
| Priority date | Sep 27, 2012 |
| Publication date | Dec 20, 2016 |
| Grant date | Dec 20, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of client device and method for audio or video conferencing are described. An embodiment includes an offset detecting unit, a configuring unit, an estimator and an output unit. The offset detecting unit detects an offset of speech input to the client device. The configuring unit determines a voice latency from the client device to every far end. The estimator estimates a time when a user at the far end perceives the offset based on the voice latency. The output unit outputs a perceivable signal indicating that a user at the far end perceives the offset based on the time estimated for the far end. The perceivable signal is helpful to avoid collision between parties.
Opening claim text (preview).
We claim: 1. A client device for use in an audio or video conference system, comprising: an offset detecting unit configured to detect an offset of speech input to the client device; a configuring unit configured to, for each of at least one far end, determine a first voice latency from the client device to the far end; an estimator configured to, for each of the at least one far end, estimate a time when a user at the far end perceives the offset, based on the first voice latency; and an output unit configured to, for each of the at least one far end, output a first perceivable signal indicating that a user at the far end perceives the offset based on the time estimated for the far end; wherein the output unit is configured to output one of subtle reverb and noticeable noise field not audible to other parties during a period after the offset detecting unit detects the offset and before the output unit outputs the first perceivable signal. 2. The client device according to claim 1 , wherein the at least one far end comprises only one far end having the largest first voice latency among all the far ends involving a conference with the client device. 3. The client device according to claim 1 , wherein the configuring unit is further configured to determine the first voice latency at least based on a transmission delay from the client device to the far end. 4. The client device according to claim 3 , wherein the configuring unit is further configured to acquire a network delay from the client device to the far end as the transmission delay. 5. The client device according to claim 1 , wherein the configuring unit is further configured to determine a network delay of a route from the client device to the at least one far end, further comprising a jitter monitor configured to acquire jitter range of the network delay, and the output unit is further configured to present the network delay of the route and the jitter range. 6. The client device according to claim 1 , further comprising a jitter buffer tuner configured to, in response to a user input, adjust the jitter buffer delay of a jitter buffer on a route from the client device to the at least one far end. 7. The client device according to claim 6 , further comprising a transmitting unit configured, in response to the adjusting, to transmit to the far end of the corresponding route an indication that the jitter buffer delay of the jitter buffer has been changed. 8. The client device according to claim 3 , wherein the output unit is further configured to, for each of the at least one far end, output a second perceivable signal in response to elapsing of a time interval after outputting the first perceivable signal, and wherein the configuring unit is further configured to determine the time interval as not less than a second voice latency from the far end to the client device. 9. The client device according to claim 1 , further comprising: a receiving unit configured to receive data frames; and a voice activity detector configured to detect voice activity in the data frames directly output from the receiving unit, wherein the output unit is further configured to output a third perceivable signal indicating that there is incoming speech from a far end. 10. The client device according to claim 9 , wherein the voice activity detector is further configured to detect voice activity from local audio input, and the output unit is further configured to output a fourth perceivable signal indicating that there is a collision if both voice activities are detected from the data frames and the local audio input at the same time. 11. A client device for use in an audio or video conference system, comprising: a receiving unit configured to receive data frames; a voice activity detector configured to detect voice activity in the data frames directly output from the receiving unit; and an output unit configured to output a perceivable signal indicating that there is incoming speech from a far end, wherein the voice activity detector is further configured to detect voice activity from local audio input, and the output unit is further configured to output another perceivable signal indicating that there is a collision if both voice activities are detected from the data frames and the local audio input at the same time. 12. A method of audio or video conferencing for use in a client device, comprising: a configuring step of, for each of at least one far end, determining a first voice latency from the client device to the far end; a detecting step of detecting an offset of speech input to the client device; an estimating step of, for each of the at least one far end, estimating a time when a user at the far end perceives the offset, based on the first voice latency; an outputting step of, for each of the at least one far end, outputting a first perceivable signal indicating that a user at the far end perceives the offset based on the time estimated for the far end; and outputting one of subtle reverb and noticeable noise field not audible to other parties during a period after detecting the offset and before outputting the first perceivable signal. 13. The method according to claim 12 , wherein the configuring step further comprises determining the first voice latency at least based on a transmission delay from the client device to the far end. 14. The method according to claim 12 , further comprising: determining a network delay of a route from the client device to the at least one far end, acquiring jitter range of the network delay, and presenting the network delay of the route and the jitter range. 15. The method according to claim 12 , further comprising, in response to a user input, adjusting the jitter buffer delay of a jitter buffer on a route from the client device to the at least one far end. 16. The method according to claim 15 , further comprising, in response to the adjusting, transmitting to the far end of the corresponding route an indication that the jitter buffer delay of the jitter buffer has been changed, wherein the indication further comprises the adjusted jitter buffer delay of the jitter buffer. 17. The method according to claim 13 , further comprising: for each of the at least one far end, outputting a second perceivable signal in response to elapsing of a time interval after outputting the first perceivable signal, and wherein the time interval is set as not less than a second voice latency from the far end to the client device. 18. The method according to claim 12 , further comprising: a receiving step of receiving data frames; and a voice activity detecting step of detecting voice activity in the data frames received through the receiving step, wherein the outputting step further comprises outputting a third perceivable signal indicating that there is incoming speech from a far end, detecting voice activity from local audio input, and outputting a fourth perceivable signal indicating that there is a collision if both voice activities are detected from the data frames and the local audio input at the same time.
Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals (selecting H04Q) · CPC title
Conference systems · CPC title
using the instant speaker's algorithm (speech detection per se G10L25/78) · CPC title
Delay circuits; Timers · CPC title
Displays · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.