Talker collisions in an auditory scene

US9502047B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9502047-B2
Application numberUS-201314373336-A
CountryUS
Kind codeB2
Filing dateMar 21, 2013
Priority dateMar 23, 2012
Publication dateNov 22, 2016
Grant dateNov 22, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

From a plurality of received voice signals, a signal interval in which there is a talker collision between at least a first and a second voice signal is detected. A processor receives a positive detection result and processes, in response to this, at least one of the voice signals with the aim of making it perceptually distinguishable. A mixer mixes the voice signals to supply an output signal, wherein the processed signal(s) replaces the corresponding received signals. In example embodiments, signal content is shifted away from the talker collision in frequency or in time. The invention may be useful in a conferencing system.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of mixing voice signals while mitigating talker collisions between the voice signals, the method comprising: receiving two or more voice signals with a common time base; detecting a signal interval in which there is a talker collision between at least a first and a second voice signal out of said voice signals, wherein said detecting comprises: deriving a frequency-variable energy content indicator for each of the voice signals; and based on the energy content indicator, applying a detection condition including having comparable energy content in the first and the second voice signal in a talker collision location being a frequency sub-range in a signal interval; processing, in case of a positive detection result, the first voice signal of said voice signals with the aim of making it perceptually distinguishable, wherein the processing is restricted to time segments where it is needed; and mixing the at least one processed voice signal with the remaining voice signals in accordance with the common time base to obtain an output signal. 2. The method of claim 1 , wherein the processing includes time-shifting the signal content of the detected signal interval of the first voice signal in relation to the common time base. 3. The method of claim 2 , wherein the time-shifting includes applying a succession of positive and negative time stretching with respect to the common time base to the first voice signal. 4. The method of claim 2 , wherein the time-shifting includes attenuating the signal content of the detected signal interval and copying the signal content of the detected signal interval to an adjacent signal interval. 5. The method of claim 1 , wherein the processing includes frequency-shifting the signal content of the talker collision location. 6. The method of claim 5 , wherein the frequency-shifting includes a gradual onset and/or gradual release. 7. The method of claim 2 , wherein the processing affects only a frequency sub-range of the signal content in the detected signal interval. 8. The method of claim 2 , further comprising, prior to shifting: segmenting a portion of the first voice signal into phonemes; and adjusting the detected signal interval to cover complete phonemes only. 9. The method of claim 1 , wherein the processing includes time-shifting or frequency-shifting the signal content in time segments having a duration of the order of 0.1 s. 10. The method of claim 1 , wherein the detection condition further includes having energy content above a predefined threshold both in the first and the second voice signal in the talker collision location. 11. The method of claim 1 , wherein the voice signals are partitioned into time-frequency tiles, each associated with a value of the energy content indicator and being the basic detection unit. 12. The method of claim 1 , further comprising electing the voice signal with least energy content in the detected signal interval as the first signal, wherein the processing includes time-shifting or frequency-shifting the signal content of the detected signal interval and affects the first signal. 13. The method of claim 1 , wherein: the detection further includes finding at least one target location being a combination of a frequency sub-range and signal interval, which target location is close to the talker collision location and in which the detection condition fails; and the processing includes time-shifting or frequency-shifting the signal content of the first signal into said target location. 14. The method of claim 13 , wherein: the detection further comprises finding at least two target locations and, for each target location, deriving a metric indicating the shift distance with respect to the talker collision location; and the processing includes time-shifting or frequency-shifting the signal content of the first signal into that target location for which the metric is minimal. 15. The method of claim 14 , wherein: a first target location corresponds to a pure positive time shift or pure frequency shift and a second target location corresponds to a pure negative time-shift or pure frequency-shift, respectively; and that target location for which the shift amount is minimal is selected. 16. The method of claim 1 , further comprising processing a strict subset of the voice signals by applying an effect in the group comprising: harmonic excitation; an oscillating effect; tremolo; vibrato; chorus; flanging; and phasing. 17. The method of claim 1 , implemented in a live conferencing system. 18. A computer-readable medium storing computer-readable instructions for performing the method of any of the preceding claims. 19. A device for mixing voice signals, comprising: an interface for receiving one or more voice signals with a common time base; a collision detector for detecting a signal interval in which there is a talker collision between at least a first and a second voice signal out of said voice signals, wherein the collision detector is configured to: derive a frequency-variable energy content indicator for each of the voice signals; and based on the energy content indicator, apply a detection condition including having comparable energy content in the first and the second voice signal in a talker collision location being a frequency sub-range in a signal interval; a processor for receiving a detection result from the collision detector and processing, in response to a positive detection result, at least one of the voice signals with the aim of making it perceptually distinguishable, wherein the processor is configured to restrict said processing to time segments where the processing is needed; and a mixer for parsing the at least one processed voice signal and the remaining voice signals with respect to the common time base and mixing these signals accordingly to supply an output signal.

Assignees

Inventors

Classifications

  • G10L21/003Primary

    Changing voice quality, e.g. pitch or formants · CPC title

  • Processing in the frequency domain · CPC title

  • audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants (echo suppression in two-way loud-speaking telephone systems H04M9/02; sound field processing per se H04S7/30) · CPC title

  • Voice signal separating · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9502047B2 cover?
From a plurality of received voice signals, a signal interval in which there is a talker collision between at least a first and a second voice signal is detected. A processor receives a positive detection result and processes, in response to this, at least one of the voice signals with the aim of making it perceptually distinguishable. A mixer mixes the voice signals to supply an output signal,…
Who is the assignee on this patent?
Dolby Laboratories Licensing Corp
What technology area does this patent fall under?
Primary CPC classification G10L21/003. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 22 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).