Peer-aware ranking of voice streams

US9331887B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9331887-B2
Application numberUS-27793206-A
CountryUS
Kind codeB2
Filing dateMar 29, 2006
Priority dateMar 29, 2006
Publication dateMay 3, 2016
Grant dateMay 3, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A peer-aware voice stream ranking method that makes decisions based on information about participants of a voice conference over a network. Whether to send a participant's own audio packet out on the network is based both on information about the participant's own voice packet and voice packets that the participant receives from other clients. A Voice Activity Score (VAS) is computed for each frame of a particular voice stream. The VAS includes a voiceness component, indicating the likelihood that the audio frame contains speech or voice, and an energy level component that indicating the ratio of current frame energy to the long-term average of energy for a current speaker. Using the VAS from the participants, the method also ranks the client's voice stream as compared to other clients' voice streams in the voice conference. If there are participants higher ranking, the client's voice stream is not sent.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for performing silence suppression in a computer network voice conference having a plurality of participants, comprising: obtaining voice activity scores computed on an ongoing basis from a combination of a computed energy of one or more sequential frames of a local participant's audio signal and a voiceness score computed from the corresponding sequential frames; providing each computed voice activity score of the local participant's audio signal to one or more remote participants; obtaining voice activity scores from one or more of the remote participants, each voice activity score being computed on an ongoing basis from a computed energy of one or more sequential frames of audio signals of the remote participants; and wherein any of the participants evaluate each voice activity score obtained from any other participants relative to their own voice activity scores to make ongoing independent decisions as to whether to transmit one or more sequential audio signal frames to each of the other participants. 2. The method of claim 1 , wherein there are at least two remote participants. 3. The method of claim 1 , wherein the voiceness scores indicate a likelihood that particular audio frames contains speech. 4. The method of claim 1 , wherein when making the decision as to whether to transmit, each participant compares its current voice activity score with a delayed version of the voice activity scores of the remainder of the participants. 5. The method of claim 4 , further comprising computing a speaker ranking of the local participant relative to the remainder of the participants based on the voice activity scores of the local and remote participants. 6. The method of claim 5 , wherein the decisions as to whether to transmit include how much bandwidth to allot to each participant based on each participant's speaker ranking. 7. The method of claim 5 , further comprising computing a preliminary voice activity score of the local participant by combining a voiceness score, which indicates the likelihood that the local participant's audio signal contains speech or voice, and an energy score, which determines an amount of energy contained in the voice. 8. The method of claim 7 , further comprising filtering the preliminary voice activity score to avoid situations where the speaker ranking changes frequently. 9. The method of claim 7 , further comprising normalizing the energy score by dividing a current energy of an audio frame of the local participant's audio signal by a long-term average energy for one of the participants who is a current speaker. 10. The method of claim 1 , further comprising designating a participant having a voice activity score that exceeds the voice activity score of a current speaker by more than a barge-in margin as a next current speaker. 11. The method of claim 1 , further comprising using an audio bridge on the computer network to compute a variable threshold based on the voice activity score of the local participant and the voice activity scores of the remainder of the participants. 12. The method of claim 11 , further comprising: comparing the first voice activity score to the variable threshold; and having the local participant make a decision based on the comparison. 13. A computer-readable storage device having stored thereon computer-executable instructions for performing silence suppression on a client device connected to a network in a voice conference having a plurality of participants, comprising: computing a local voice activity score on an ongoing basis from a combination of a computed energy of one or more sequential frames of a local participant's audio signal and a voiceness score computed from the corresponding sequential frames; providing the local voice activity score to a plurality of remote participants; obtaining voice activity scores of the remote participants; and having any of the participants make an independent decision based on the local participant's voice activity score and the voice activity scores of the remainder of the participants whether to send that participants own audio signal out on the network. 14. The computer-readable storage device of claim 13 , further comprising: computing Mel-Frequency Cepstral Coefficients for each audio frame of the participant's audio signal; obtaining, energy of a current audio frame from the Mel-Frequency Cepstral Coefficients computation; computing a running average of energy for a current speaker in the voice conference; and normalizing the current audio frame energy by dividing the energy of the current audio frame by the running average of energy for the current speaker to obtain an energy score. 15. The computer-readable storage device of claim 14 , further comprising: obtaining a current voice stream ranking of the voice conference; combining a voiceness score and the energy score to obtain a preliminary voice activity score; filtering the preliminary voice activity score to temporally smooth the preliminary voice activity score to avoid spurious changes in the voice stream ranking. 16. The computer-readable storage device of claim 13 , further comprising: obtaining a voice activity score of a current speaker in the voice conference; defining a barge-in threshold as the current speaker's voice activity score plus a barge-in margin; comparing each participant's voice activity score with the barge-in threshold; and designating the participant having a voice activity score that is higher that the barge-in threshold as a new current speaker. 17. The computer-readable storage device of claim 13 , wherein when making the decision as to whether to send the participants own audio signal out on the network, each participant compares its current voice activity score with a delayed version of the voice activity scores of the remainder of the participants. 18. A computer-implemented process for ranking multiple voice streams in a voice conference over a computer network, comprising using a computer to perform process actions for: computing a client voice activity score based on a client's voice stream, wherein the client is in communication with the network; wherein the voice activity score is computed on an ongoing basis from a combination of a computed energy of one or more sequential frames of the client's voice stream and a voiceness score computed from the corresponding sequential frames; computing a variable threshold using an audio bridge in communication with the network using the client voice activity score and voice activity scores of the remote participants in the voice conference; using the client to compare the client voice activity score to the variable threshold; and causing the client and any of the remote participants to make an independent decision on whether to transmit their voice stream based on the comparison. 19. The computer-implernented process as set forth in claim 18 , wherein computing the variable threshold is based at least in part on, a load on the audio bridge, such that if the audio bridge has a high load, then the variable threshold is set higher. 20. The computer-implemented process as set forth in claim 18 , further comprising designating a participant having a voice activity score that exceeds the voice activity score of a current speaker by more than a barge-in margin as a next current speaker.

Assignees

Inventors

Classifications

  • H04L12/66Primary

    Arrangements for connecting between networks having differing types of switching systems, e.g. gateways · CPC title

  • with floor control · CPC title

  • Arrangements for multi-party communication, e.g. for conferences (data switching systems for conference H04L12/18; arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities H04M3/56; television conferencing systems H04N7/15) · CPC title

  • Electricity · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9331887B2 cover?
A peer-aware voice stream ranking method that makes decisions based on information about participants of a voice conference over a network. Whether to send a participant's own audio packet out on the network is based both on information about the participant's own voice packet and voice packets that the participant receives from other clients. A Voice Activity Score (VAS) is computed for each f…
Who is the assignee on this patent?
He Li-Wei, Florencio Dinei A, Xu Xun, and 1 more
What technology area does this patent fall under?
Primary CPC classification H04L12/66. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue May 03 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).