Automatic discovery and localization of speaker locations in surround sound systems

US11425503B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11425503-B2
Application numberUS-202016987197-A
CountryUS
Kind codeB2
Filing dateAug 6, 2020
Priority dateSep 29, 2016
Publication dateAug 23, 2022
Grant dateAug 23, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments are described for a method of simultaneously localizing a set of speakers and microphones, having only the times of arrival between each of the speakers and microphones. An autodiscovery process uses an external input to set: a global translation (3 continuous parameters), a global rotation (3 continuous parameters), and discrete symmetries, i.e., an exchange of any axis pairs and/or reversal of any axis. Different time of arrival acquisition techniques may be used, such as ultrasonic sweeps or generic multitrack audio content. The autodiscovery algorithm is based in minimizing a certain cost function, and the process allows for latencies in the recordings, possibly linked to the latencies in the emission.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for localizing speakers in a listening environment having a plurality of speakers and microphones, comprising: receiving one or more respective times of arrival (TOA) values for each speaker of the plurality of speakers to each microphone of the plurality of microphones to generate multiple TOA candidates, wherein each microphone is proximate a single respective speaker; receiving configuration parameters of the listening environment; minimizing a cost function using the configuration parameters and each of the one or more respective TOA values of each speaker to estimate a position and latency of a respective speaker and microphone; iterating the cost function minimization over each TOA candidate of the multiple TOA candidates to provide estimates of the position and latency of each of the plurality of speakers and microphones; and providing speaker location information determined from the estimates of the position and latency of each of the plurality of speakers and microphones to one or more post-processing or audio rendering components. 2. The method of claim 1 wherein each microphone is placed inside, on top of, or attached to a speaker cabinet of the single respective speaker, and further wherein the received TOA include multiple TOA candidates for at least one of the speakers to at least one of the microphones. 3. The method of claim 1 , comprising: estimating an impulse (IR) of the listening environment based on a reference audio sequence played back by one or more of the speakers and a recording of the reference audio sequence obtained from one or more of the microphones; and using the IR to search for direct sound candidate peaks, wherein the multiple TOA candidates correspond to respective candidate peaks identified in the search, wherein the speaker location information provided to one or more post-processing or audio rendering components is based on a selection among the TOA candidates for which a residual of the minimizing step is below a certain threshold value. 4. The method of claim 1 , comprising: estimating an impulse response (IR) of the listening environment by one of: cross-correlating a known reference audio sequence to a recording of the sequence obtained from the microphones to derive a pseudo-impulse response, or deconvolving a calibration audio sequence and a recording of the calibration audio sequence obtained from the microphones; using the IR to search for direct sound candidate peaks by evaluating a reference peak and using noise levels around the reference peak, wherein the multiple TOA candidates correspond to respective candidate peaks identified in the search; and performing a multiple peak evaluation by selecting an initial TOA matrix, evaluating the initial matrix with residuals of the minimizing step, and changing TOA matrix elements until the residuals are below a defined threshold value. 5. The method of claim 4 , wherein using the IR to search for direct sound candidate peaks includes: searching for alternative peaks at least in a portion of the IR located before the reference peak. 6. The method of claim 1 wherein the latency comprises a playback latency for at least one speaker. 7. The method of claim 1 , wherein the latency comprises a recording latency for at least one microphone. 8. The method of claim 1 , wherein the configuration parameters comprise at least one of: the number of speakers and microphones, a size of the listening environment; bounds on the playback and recording latencies; a specification of two-dimensional or three-dimensional speaker location; constraints on speaker and microphone relative positioning; constraints on speaker and microphone relative latencies; and references to disambiguate rotation, translation and axes inversion symmetries. 9. The method of the claim 1 further comprising providing a seed layout to the cost function, the seed layout specifying the correct number of speakers and microphones in defined initial positions relative to a defined speaker layout standard. 10. The method of claim 9 further comprising transforming the estimated location information into a canonical format based on a configuration of the speakers in the listening environment. 11. The method of claim 1 wherein the speakers in the listening environment are placed in a surround-sound configuration having a plurality of front, rear and surround speakers and one or more low frequency effect speakers, and wherein at least some speakers are height speakers providing playback of height cues present in an input audio signal comprising immersive audio content. 12. The method of claim 1 wherein obtaining the one or more respective TOA values may be performed using at least one of: a room calibration audio sequence emitted sequentially by each of the speakers and recorded simultaneously by the microphones; a calibration audio sequence band-limited to the close ultrasonic range, such as 18 to 24 kHz; an arbitrary multichannel audio sequence; and a specifically defined multichannel audio sequence, to recover a room impulse response from a multichannel audio sequence. 13. The method of claim 12 further comprising using the estimated speaker location information to modify a rendering process transmitting speaker feeds to each speaker, and wherein the listening environment comprises one of a large venue playing cinema content, or a home theater, and wherein at least some of the speakers comprise wireless speakers coupled to a renderer executing the rendering process over a wireless data network. 14. The method of claim 1 further comprising: estimating an impulse response (IR) of the listening environment by one of: cross-correlating a known reference audio sequence to a recording of the sequence obtained from the microphones to derive a pseudo-impulse response, or deconvolving a calibration audio sequence and a recorded audio program; and estimating one or more best TOA candidates from at least one of the estimated IR or pseudo-IR using an iterative peak-searching algorithm. 15. The method of claim 1 further comprising: using residual values of the minimizing step to provide an estimate of the internal coherence of the original TOA values; and generating an error estimate to allow for iterating over the cost function minimization process to improve the estimated location. 16. The method of claim 1 wherein the TOA values are formatted into a matrix of dimension n by n, where n is the number of the speakers and co-located microphones. 17. The method of claim 1 wherein the step of receiving the TOA values for each speaker each of microphone using the multiple TOA candidates comprises: deconvolving a calibration audio sequence sent to each speaker to obtain a room impulse response (IR); using the IR to search for direct sound candidate peaks by evaluating a reference peak and using noise levels around the reference peak; and performing a multiple peak evaluation by selecting an initial TOA matrix, evaluating the initial matrix with residuals of the minimizing step, and changing TOA matrix elements until the residuals are below a defined threshold value. 18. The method of claim 17 wherein the minimizing step is performed using a nonlinear minimization algorithm using an Interior Point Optimize software library in an executable software program. 19. The method of claim 17 further comprising explicitly providing explicit first derivatives (Jacobian) and second derivatives (Hessian) of the cost functions and constraints with respect to unkn

Assignees

Inventors

Classifications

  • Spatial or constructional arrangements of loudspeakers · CPC title

  • Automatic calibration of stereophonic sound system, e.g. with test microphone · CPC title

  • Positioning of individual sound objects, e.g. moving airplane, within a sound field (H04S2420/13 takes precedence) · CPC title

  • using ultrasonic, sonic or infrasonic waves · CPC title

  • Spatial or constructional arrangements of microphones, e.g. in dummy heads · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11425503B2 cover?
Embodiments are described for a method of simultaneously localizing a set of speakers and microphones, having only the times of arrival between each of the speakers and microphones. An autodiscovery process uses an external input to set: a global translation (3 continuous parameters), a global rotation (3 continuous parameters), and discrete symmetries, i.e., an exchange of any axis pairs and/o…
Who is the assignee on this patent?
Dolby Laboratories Licensing Corp, Dolby Int Ab
What technology area does this patent fall under?
Primary CPC classification H04R5/04. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Aug 23 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).