Controlling A Jitter Buffer
US-2015350099-A1 · Dec 3, 2015 · US
US11632318B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11632318-B2 |
| Application number | US-202016988571-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 7, 2020 |
| Priority date | Apr 16, 2014 |
| Publication date | Apr 18, 2023 |
| Grant date | Apr 18, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Some implementations involve analyzing audio packets received during a time interval that corresponds with a conversation analysis segment to determine network jitter dynamics data and conversational interactivity data. The network jitter dynamics data may provide an indication of jitter in a network that relays the audio data packets. The conversational interactivity data may provide an indication of interactivity between participants of a conversation represented by the audio data. A jitter buffer size may be controlled according to the network jitter dynamics data and the conversational interactivity data. The time interval may include a plurality of talkspurts.
Opening claim text (preview).
We claim: 1. A method, comprising: receiving audio data packets; extracting audio data frames from the audio data packets, the audio data frames corresponding to a time interval within a conversation analysis segment that includes a plurality of talkspurts, wherein a talkspurt is a segment of speech between mutual silent times of a conversation and wherein a mutual silent time is a time during which no conversational participant is speaking; analyzing the audio data frames to determine network jitter dynamics data and conversational interactivity data, wherein the network jitter dynamics data provides an indication of jitter in a network that relays the audio data packets, and wherein determining the conversational interactivity data comprises analyzing the conversational activity of only a single conversational participant to determine whether the single conversational participant is talking or not talking, wherein analyzing the audio data frames involves determining percentile ranges of packet delay times according to order statistics of packet delay variation, the percentile ranges of packet delay times including shortest packet delay times, median packet delay times and longest packet delay times, and wherein determining the network jitter dynamics data involves determining an inter-percentile range of packet delay corresponding to a difference between one of the longest packet delay times and one of the median packet delay times; and controlling a jitter buffer size by selecting one of a plurality of jitter buffer control modes in response to both the network jitter dynamics data and the conversational interactivity data. 2. The method of claim 1 , wherein analyzing the audio data frames to determine the network jitter dynamics data involves determining at least one of packet delay variation (PDV) or inter-arrival time (IAT) variation based, at least in part, on actual packet arrival times, wherein determining PDV involves comparing expected packet arrival times with the actual packet arrival times. 3. The method of claim 1 , wherein analyzing the audio data frames to determine the conversational interactivity data involves one or more of determining single-talk times during which only a single conversational participant is speaking, determining double-talk times during which two or more conversational participants are speaking, and determining mutual silent times during which no conversational participant is speaking. 4. The method of claim 1 , wherein controlling the jitter buffer size involves setting the jitter buffer to a relatively smaller size when the single conversational participant is talking and setting the jitter buffer to a relatively larger size when the single conversational participant is not talking. 5. The method of claim 1 , wherein controlling the jitter buffer size involves setting a jitter buffer to a relatively larger size when the network jitter dynamics data indicates more than a threshold amount of network jitter. 6. The method of claim 1 , wherein controlling the jitter buffer size involves setting a jitter buffer for the single conversational participant to a relatively larger size when the network jitter dynamics data indicates more than a threshold amount of network jitter or when the conversational interactivity data indicates less than a threshold amount of conversational participation by the single conversational participant. 7. The method of claim 1 , wherein controlling the jitter buffer size involves setting a jitter buffer to a relatively smaller size when the network jitter dynamics data indicates less than a threshold amount of network jitter or when the conversational interactivity data indicates at least a threshold amount of conversational interactivity. 8. The method of claim 1 , wherein controlling the jitter buffer size involves setting a jitter buffer for the single conversational participant to a relatively smaller size when the network jitter dynamics data indicates less than a threshold amount of network jitter or when the conversational interactivity data indicates at least a threshold amount of conversational participation by the single conversational participant. 9. The method of claim 1 , wherein controlling the jitter buffer size involves setting a jitter buffer size according to one of at least three jitter buffer control modes. 10. The method of claim 9 , wherein the jitter buffer control modes include a peak mode, a low-loss mode and a normal mode and each jitter buffer control mode corresponds to a jitter buffer size and a range of jitter buffer sizes. 11. The method of claim 9 , wherein one of the jitter buffer control modes corresponds to network jitter dynamics data indicating at least a threshold amount of network jitter and conversational interactivity data indicating at least a threshold amount of conversational interactivity. 12. The method of claim 9 , wherein one of the jitter buffer control modes corresponds to network jitter dynamics data indicating at least a threshold amount of network jitter and conversational interactivity data indicating less than a threshold amount of conversational interactivity. 13. The method of claim 9 , wherein one of the jitter buffer control modes corresponds to network jitter dynamics data indicating less than a threshold amount of network jitter and conversational interactivity data indicating at least a threshold amount of conversational interactivity. 14. The method of claim 9 , wherein one of the jitter buffer control modes corresponds to network jitter dynamics data indicating less than a threshold amount of network jitter and conversational interactivity data indicating less than a threshold amount of conversational interactivity. 15. An apparatus comprising one or more processing devices configured to: receive audio data packets; extract audio data frames from the audio data packets, the audio data frames corresponding to a time interval within a conversation analysis segment that includes a plurality of talkspurts, wherein a talkspurt is a segment of speech between mutual silent times of a conversation and wherein a mutual silent time is a time during which no conversational participant is speaking; analyze the audio data frames to determine network jitter dynamics data and conversational interactivity data, wherein analyzing the audio data frames involves determining percentile ranges of packet delay times according to order statistics of packet delay variation, the percentile ranges of packet delay times including shortest packet delay times, median packet delay times and longest packet delay times, wherein the network jitter dynamics data provides an indication of jitter in a network that relays the audio data packets, wherein determining the network jitter dynamics data involves determining an inter-percentile range of packet delay corresponding to a difference between one of the longest packet delay times and one of the median packet delay times and wherein determining the conversational interactivity data comprises analyzing the conversational activity of only a single conversational participant to determine whether the single conversational participant is talking or not talking; and control a jitter buffer size by selecting one of a plurality of jitter buffer control modes in response to both the network jitter dynamics data and the conversational interactivity data. 16. A non-transitory medium having software stored thereon, the software including instructions which, when executed by one or more processing devices, cause the one or more processing devices to perform a method, comprising: r
Network arrangements, protocols or services for supporting real-time applications in data packet communication (real-time or near real-time messaging, e.g. instant messaging [IM] H04L51/04; selective video distribution H04N21/00) · CPC title
Synchronising arrangements {(for television systems H04N5/04; bit-synchronisation H04L7/00)} · CPC title
Threshold monitoring · CPC title
Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title
Discriminating between voiced and unvoiced parts of speech signals (G10L25/90 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.