Jitter buffer control based on monitoring of delay jitter and conversational dynamics

US11632318B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11632318-B2
Application numberUS-202016988571-A
CountryUS
Kind codeB2
Filing dateAug 7, 2020
Priority dateApr 16, 2014
Publication dateApr 18, 2023
Grant dateApr 18, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Some implementations involve analyzing audio packets received during a time interval that corresponds with a conversation analysis segment to determine network jitter dynamics data and conversational interactivity data. The network jitter dynamics data may provide an indication of jitter in a network that relays the audio data packets. The conversational interactivity data may provide an indication of interactivity between participants of a conversation represented by the audio data. A jitter buffer size may be controlled according to the network jitter dynamics data and the conversational interactivity data. The time interval may include a plurality of talkspurts.

First claim

Opening claim text (preview).

We claim: 1. A method, comprising: receiving audio data packets; extracting audio data frames from the audio data packets, the audio data frames corresponding to a time interval within a conversation analysis segment that includes a plurality of talkspurts, wherein a talkspurt is a segment of speech between mutual silent times of a conversation and wherein a mutual silent time is a time during which no conversational participant is speaking; analyzing the audio data frames to determine network jitter dynamics data and conversational interactivity data, wherein the network jitter dynamics data provides an indication of jitter in a network that relays the audio data packets, and wherein determining the conversational interactivity data comprises analyzing the conversational activity of only a single conversational participant to determine whether the single conversational participant is talking or not talking, wherein analyzing the audio data frames involves determining percentile ranges of packet delay times according to order statistics of packet delay variation, the percentile ranges of packet delay times including shortest packet delay times, median packet delay times and longest packet delay times, and wherein determining the network jitter dynamics data involves determining an inter-percentile range of packet delay corresponding to a difference between one of the longest packet delay times and one of the median packet delay times; and controlling a jitter buffer size by selecting one of a plurality of jitter buffer control modes in response to both the network jitter dynamics data and the conversational interactivity data. 2. The method of claim 1 , wherein analyzing the audio data frames to determine the network jitter dynamics data involves determining at least one of packet delay variation (PDV) or inter-arrival time (IAT) variation based, at least in part, on actual packet arrival times, wherein determining PDV involves comparing expected packet arrival times with the actual packet arrival times. 3. The method of claim 1 , wherein analyzing the audio data frames to determine the conversational interactivity data involves one or more of determining single-talk times during which only a single conversational participant is speaking, determining double-talk times during which two or more conversational participants are speaking, and determining mutual silent times during which no conversational participant is speaking. 4. The method of claim 1 , wherein controlling the jitter buffer size involves setting the jitter buffer to a relatively smaller size when the single conversational participant is talking and setting the jitter buffer to a relatively larger size when the single conversational participant is not talking. 5. The method of claim 1 , wherein controlling the jitter buffer size involves setting a jitter buffer to a relatively larger size when the network jitter dynamics data indicates more than a threshold amount of network jitter. 6. The method of claim 1 , wherein controlling the jitter buffer size involves setting a jitter buffer for the single conversational participant to a relatively larger size when the network jitter dynamics data indicates more than a threshold amount of network jitter or when the conversational interactivity data indicates less than a threshold amount of conversational participation by the single conversational participant. 7. The method of claim 1 , wherein controlling the jitter buffer size involves setting a jitter buffer to a relatively smaller size when the network jitter dynamics data indicates less than a threshold amount of network jitter or when the conversational interactivity data indicates at least a threshold amount of conversational interactivity. 8. The method of claim 1 , wherein controlling the jitter buffer size involves setting a jitter buffer for the single conversational participant to a relatively smaller size when the network jitter dynamics data indicates less than a threshold amount of network jitter or when the conversational interactivity data indicates at least a threshold amount of conversational participation by the single conversational participant. 9. The method of claim 1 , wherein controlling the jitter buffer size involves setting a jitter buffer size according to one of at least three jitter buffer control modes. 10. The method of claim 9 , wherein the jitter buffer control modes include a peak mode, a low-loss mode and a normal mode and each jitter buffer control mode corresponds to a jitter buffer size and a range of jitter buffer sizes. 11. The method of claim 9 , wherein one of the jitter buffer control modes corresponds to network jitter dynamics data indicating at least a threshold amount of network jitter and conversational interactivity data indicating at least a threshold amount of conversational interactivity. 12. The method of claim 9 , wherein one of the jitter buffer control modes corresponds to network jitter dynamics data indicating at least a threshold amount of network jitter and conversational interactivity data indicating less than a threshold amount of conversational interactivity. 13. The method of claim 9 , wherein one of the jitter buffer control modes corresponds to network jitter dynamics data indicating less than a threshold amount of network jitter and conversational interactivity data indicating at least a threshold amount of conversational interactivity. 14. The method of claim 9 , wherein one of the jitter buffer control modes corresponds to network jitter dynamics data indicating less than a threshold amount of network jitter and conversational interactivity data indicating less than a threshold amount of conversational interactivity. 15. An apparatus comprising one or more processing devices configured to: receive audio data packets; extract audio data frames from the audio data packets, the audio data frames corresponding to a time interval within a conversation analysis segment that includes a plurality of talkspurts, wherein a talkspurt is a segment of speech between mutual silent times of a conversation and wherein a mutual silent time is a time during which no conversational participant is speaking; analyze the audio data frames to determine network jitter dynamics data and conversational interactivity data, wherein analyzing the audio data frames involves determining percentile ranges of packet delay times according to order statistics of packet delay variation, the percentile ranges of packet delay times including shortest packet delay times, median packet delay times and longest packet delay times, wherein the network jitter dynamics data provides an indication of jitter in a network that relays the audio data packets, wherein determining the network jitter dynamics data involves determining an inter-percentile range of packet delay corresponding to a difference between one of the longest packet delay times and one of the median packet delay times and wherein determining the conversational interactivity data comprises analyzing the conversational activity of only a single conversational participant to determine whether the single conversational participant is talking or not talking; and control a jitter buffer size by selecting one of a plurality of jitter buffer control modes in response to both the network jitter dynamics data and the conversational interactivity data. 16. A non-transitory medium having software stored thereon, the software including instructions which, when executed by one or more processing devices, cause the one or more processing devices to perform a method, comprising: r

Assignees

Inventors

Classifications

  • Network arrangements, protocols or services for supporting real-time applications in data packet communication (real-time or near real-time messaging, e.g. instant messaging [IM] H04L51/04; selective video distribution H04N21/00) · CPC title

  • Synchronising arrangements {(for television systems H04N5/04; bit-synchronisation H04L7/00)} · CPC title

  • Threshold monitoring · CPC title

  • Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title

  • Discriminating between voiced and unvoiced parts of speech signals (G10L25/90 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11632318B2 cover?
Some implementations involve analyzing audio packets received during a time interval that corresponds with a conversation analysis segment to determine network jitter dynamics data and conversational interactivity data. The network jitter dynamics data may provide an indication of jitter in a network that relays the audio data packets. The conversational interactivity data may provide an indica…
Who is the assignee on this patent?
Dolby Laboratories Licensing Corp
What technology area does this patent fall under?
Primary CPC classification H04L65/80. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Apr 18 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).