Automatic discovery and localization of speaker locations in surround sound systems

US2020366994A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2020366994-A1
Application numberUS-202016987197-A
CountryUS
Kind codeA1
Filing dateAug 6, 2020
Priority dateSep 29, 2016
Publication dateNov 19, 2020
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments are described for a method of simultaneously localizing a set of speakers and microphones, having only the times of arrival between each of the speakers and microphones. An autodiscovery process uses an external input to set: a global translation (3 continuous parameters), a global rotation (3 continuous parameters), and discrete symmetries, i.e., an exchange of any axis pairs and/or reversal of any axis. Different time of arrival acquisition techniques may be used, such as ultrasonic sweeps or generic multitrack audio content. The autodiscovery algorithm is based in minimizing a certain cost function, and the process allows for latencies in the recordings, possibly linked to the latencies in the emission.

First claim

Opening claim text (preview).

1 . A method for localizing speakers in a listening environment having a plurality of speakers and microphones, comprising: receiving one or more respective times of arrival (TOA) for each speaker of the plurality of speakers to each microphone of the plurality of microphones to generate multiple TOA candidates, wherein each microphone is proximate a single respective speaker; receiving configuration parameters of the listening environment; minimizing a cost function using each of the one or more respective TOA values of each speaker to estimate a position and latency of a respective speaker and microphone; iterating the cost function minimization over each TOA candidate of the multiple TOA candidates; and using the configuration parameters and minimized cost function to provide speaker location information to one or more post-processing or audio rendering components. 2 . The method of claim 1 wherein each microphone is placed inside, on top of, or attached to a speaker cabinet of the single respective speaker, and further wherein the received TOA include multiple TOA candidates for at least one of the speakers to at least one of the microphones. 3 . The method of claim 1 , comprising: estimating an impulse (IR) of the listening environment based on a reference audio sequence played back by one or more of the speakers and a recording of the reference audio sequence obtained from one or more of the microphones; and using the IR to search for direct sound candidate peaks, wherein the multiple TOA candidates correspond to respective candidate peaks identified in the search, wherein the speaker location information provided to one or more post-processing or audio rendering components is based on a selection among the TOA candidates for which a residual of the minimizing step is below a certain threshold value. 4 . The method of claim 1 , comprising: estimating an impulse response (IR) of the listening environment by one of: cross-correlating a known reference audio sequence to a recording of the sequence obtained from the microphones to derive a pseudo-impulse response, or deconvolving a calibration audio sequence and a recording of the calibration audio sequence obtained from the microphones; using the IR to search for direct sound candidate peaks by evaluating a reference peak and using noise levels around the reference peak, wherein the multiple TOA candidates correspond to respective candidate peaks identified in the search; and performing a multiple peak evaluation by selecting an initial TOA matrix, evaluating the initial matrix with residuals of the minimizing step, and changing TOA matrix elements until the residuals are below a defined threshold value. 5 . The method of claim 4 , wherein using the IR to search for direct sound candidate peaks includes: searching for alternative peaks at least in a portion of the IR located before the reference peak. 6 . The method of claim 1 wherein the latency comprises a playback latency for at least one speaker. 7 . The method of claim 1 , wherein the latency comprises a recording latency for at least one microphone. 8 . The method of claim 1 , wherein the configuration parameters comprise at least one of: the number of speakers and microphones, a size of the listening environment; bounds on the playback and recording latencies; a specification of two-dimensional or three-dimensional speaker location; constraints on speaker and microphone relative positioning; constraints on speaker and microphone relative latencies; and references to disambiguate rotation, translation and axes inversion symmetries. 9 . The method of the claim 1 further comprising providing a seed layout to the cost function, the seed layout specifying the correct number of speakers and microphones in defined initial positions relative to a defined speaker layout standard. 10 . The method of claim 9 further comprising transforming the estimated location information into a canonical format based on a configuration of the speakers in the listening environment. 11 . The method of claim 1 wherein the speakers in the listening environment are placed in a surround-sound configuration having a plurality of front, rear and surround speakers and one or more low frequency effect speakers, and wherein at least some speakers are height speakers providing playback of height cues present in an input audio signal comprising immersive audio content. 12 . The method of claim 1 wherein obtaining the one or more respective TOA values may be performed using at least one of: a room calibration audio sequence emitted sequentially by each of the speakers and recorded simultaneously by the microphones; a calibration audio sequence band-limited to the close ultrasonic range, such as 18 to 24 kHz; an arbitrary multichannel audio sequence; and a specifically defined multichannel audio sequence, to recover a room impulse response from a multichannel audio sequence. 13 . The method of claim 12 further comprising using the estimated speaker location information to modify a rendering process transmitting speaker feeds to each speaker, and wherein the listening environment comprises one of a large venue playing cinema content, or a home theater, and wherein at least some of the speakers comprise wireless speakers coupled to a renderer executing the rendering process over a wireless data network. 14 . The method of claim 1 further comprising: estimating an impulse response (IR) of the listening environment by one of: cross-correlating a known reference audio sequence to a recording of the sequence obtained from the microphones to derive a pseudo-impulse response, or deconvolving a calibration audio sequence and a recorded audio program; and estimating one or more best TOA candidates from at least one of the estimated IR or pseudo-IR using an iterative peak-searching algorithm. 15 . The method of claim 1 further comprising: using residual values of the minimizing step to provide an estimate of the internal coherence of the original TOA values; and generating an error estimate to allow for iterating over the cost function minimization process to improve the estimated location. 16 . The method of claim 1 wherein the TOA values are formatted into a matrix of dimension n by n, where n is the number of the speakers and co-located microphones. 17 . The method of claim 1 wherein the step of receiving the TOA values for each speaker each of microphone using the multiple TOA candidates comprises: deconvolving a calibration audio sequence sent to each speaker to obtain a room impulse response (IR); using the IR to search for direct sound candidate peaks by evaluating a reference peak and using noise levels around the reference peak; and performing a multiple peak evaluation by selecting an initial TOA matrix, evaluating the initial matrix with residuals of the minimizing step, and changing TOA matrix elements until the residuals are below a defined threshold value. 18 . The method of claim 17 wherein the minimizing step is performed using a nonlinear minimization algorithm using an Interior Point Optimize software library in an executable software program. 19 . The method of claim 17 further comprising explicitly providing explicit first derivatives (Jacobian) and second derivatives (Hessian) of the cost functions and constraints with respect to unknowns of the cost function. 20 . A system for determining locations of a plurality of speakers in a room, comprising: a microphone placed proximate each

Assignees

Inventors

Classifications

  • Spatial or constructional arrangements of loudspeakers · CPC title

  • Electronic adaptation of stereophonic audio signals to reverberation of the listening space (H04S7/301 takes precedence) · CPC title

  • H04R5/04Primary

    Circuit arrangements, {e.g. for selective connection of amplifier inputs/outputs to loudspeakers, for loudspeaker detection, or for adaptation of settings to personal preferences or hearing impairments (combinations of amplifiers H03F3/68; stereophonic systems H04S)} · CPC title

  • for combining the signals of two or more microphones (specially adapted for hearing aids H04R25/407) · CPC title

  • Application of parametric coding in stereophonic audio systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020366994A1 cover?
Embodiments are described for a method of simultaneously localizing a set of speakers and microphones, having only the times of arrival between each of the speakers and microphones. An autodiscovery process uses an external input to set: a global translation (3 continuous parameters), a global rotation (3 continuous parameters), and discrete symmetries, i.e., an exchange of any axis pairs and/o…
Who is the assignee on this patent?
Dolby Laboratories Licensing Corp, Dolby Int Ab
What technology area does this patent fall under?
Primary CPC classification H04R5/04. Mapped technology areas include Electricity.
When was this patent published?
Publication date Thu Nov 19 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).