Estimating room acoustic properties using microphone arrays

US10616706B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10616706-B1
Application numberUS-201916544202-A
CountryUS
Kind codeB1
Filing dateAug 19, 2019
Priority dateNov 5, 2018
Publication dateApr 7, 2020
Grant dateApr 7, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An audio analysis system receives a first recording of a speech signal from an origin audio assembly and a second recording of at least a portion of the speech signal from a receiving audio assembly. The speech signal originates from a speaking user of the origin audio assembly and the second recording is recorded by a receiving audio assembly operated by a different user. Both the origin audio assembly and the receiving audio assembly are located within a room. The audio analysis system selects one or more audio frames in the first recording and one or more audio frames in the second recording that both occur over the same time period. The audio analysis system determines a transfer function for the room based in part on the selected one or more audio frames in the first recording and the selected one or more audio frames in the second recording.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: selecting an audio frame of a first recording of a speech signal and an audio frame of a second recording that occur over a same time period and includes at least a portion of the speech signal, wherein an origin audio assembly recorded the first recording of the speech signal and the speech signal originated from a speaking user of the origin audio assembly at a first position in a room and the second recording was recorded at a receiving audio assembly operated by a different user at a second position in the room; and determining a room-impulse response associated with the first position relative to the second position in the room, based in part on the selected audio frame in the first recording and the selected audio frame in the second recording. 2. The method of claim 1 , wherein the audio frame in the first recording and the audio frame in the second recording are selected using cross-correlation analysis of the first recording and the second recording. 3. The method of claim 1 , wherein selecting the audio frame in the first recording and the audio frame in the second recording comprises: determining a decay level for the selected audio frame of the first recording; comparing the decay level for the selected audio frame to a target decay level; and responsive to determining the decay level of the selected audio frame to be below the target decay level, aggregating the selected audio frame with a second audio frame of the first recording. 4. The method of claim 1 , further comprising: determining a threshold number of frames based on a distance between the origin audio assembly and the receiving audio assembly; and responsive to determining that the selected number of frames from each of the first recording and second recording are below the threshold number of frames, determining an intermediate impulse response based on the selected audio frames. 5. The method of claim 1 , wherein the room impulse response is determined by deconvolving the audio frame selected from the first recording with the audio frame selected from the second recording. 6. The method of claim 1 , further comprising one or more of the following: decomposing the room impulse response into octave bands associated with the speech signal to extend the decay of the signal; determining one or more audio parameters based on the room impulse response; and determining one or more acoustic parameters for the room based on the room impulse response. 7. The method of claim 1 , further comprising: determining, for a plurality of positions in the room relative to the second position, a room impulse response associated with each position of the plurality of positions relative to the second position in the room; and determining a transfer function for the room based in part on the room impulse response for each position of the plurality. 8. A device comprising: a microphone array comprising a plurality of acoustic sensors, wherein the microphone array records a first recording of a speech signal originating from a speaking user; and a controller configured to: select an audio frame of a first recording of a speech signal and an audio frame of a second recording that occur over a same time period and includes at least a portion of the speech signal, wherein an origin audio assembly recorded the first recording of the speech signal and the speech signal originated from a speaking user of the origin audio assembly at a first position in a room and the second recording was recorded at a receiving audio assembly operated by a different user at a second position in the room; and determine a room-impulse response associated with the first position relative to the second position in the room, based in part on the selected audio frame in the first recording and the selected audio frame in the second recording. 9. The device of claim 8 , wherein the audio frame in the first recording and the audio frame in the second recording are selected using cross-correlation analysis of the first recording and the second recording. 10. The device of claim 8 , wherein the controller is further configured to: determining a decay level for the selected audio frame of the first recording; comparing the decay level for the selected audio frame to a target decay level; and responsive to determining the decay level of the selected audio frame to be below the target decay level, aggregating the selected audio frame with a second audio frame of the first recording. 11. The device of claim 8 , wherein the controller is further configured to: determine a threshold number of frames based on a distance between the origin audio assembly and the receiving audio assembly; and responsive to determining that the selected number of frames from each of the first recording and second recording are below the threshold number of frames, determine an intermediate impulse response based on the selected audio frames. 12. The device of claim 8 , wherein the room impulse response is determined by deconvolving the audio frame selected from the first recording with the audio frame selected from the second recording. 13. The device of claim 8 , wherein the controller is further configured to: decompose the room impulse response into octave bands associated with the speech signal to extend the decay of the signal; determine one or more audio parameters based on the room impulse response; and determine one or more acoustic parameters for the room based on the room impulse response. 14. The device of claim 8 , wherein the controller is further configured to: determine, for a plurality of positions in the room relative to the second position, a room impulse response associated with each position of the plurality of positions relative to the second position in the room; and determine a transfer function for the room based in part on the room impulse response for each position of the plurality. 15. A non-transitory computer-readable medium configured to store computer-readable instructions that, when executed by a processor, cause the processor to perform steps comprising: selecting an audio frame of a first recording of a speech signal and an audio frame of a second recording that occur over a same time period and includes at least a portion of the speech signal, wherein an origin audio assembly recorded the first recording of the speech signal and the speech signal originated from a speaking user of the origin audio assembly at a first position in a room and the second recording was recorded at a receiving audio assembly operated by a different user at a second position in the room; and determining a room-impulse response associated with the first position relative to the second position in the room, based in part on the selected audio frame in the first recording and the selected audio frame in the second recording. 16. The non-transitory computer-readable medium of claim 15 , wherein the audio frame in the first recording and the audio frame in the second recording are selected using cross-correlation analysis on the first recording and the second recording. 17. The non-transitory computer-readable medium of claim 15 , wherein selecting the audio frame in the first recording and the audio frame in the second recording comprises: determining a decay level for the selected audio frame of the first recording; comparing the decay level for the selected audio frame to a target decay level; and responsive to determining the decay level of the selected audio frame to be below the target decay

Assignees

Inventors

Classifications

  • Automatic calibration of stereophonic sound system, e.g. with test microphone · CPC title

  • H04S7/305Primary

    Electronic adaptation of stereophonic audio signals to reverberation of the listening space (H04S7/301 takes precedence) · CPC title

  • Synergistic effects of band splitting and sub-band processing · CPC title

  • G01H7/00Primary

    Measuring reverberation time {; room acoustic measurements} · CPC title

  • Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic (H04R2203/12 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10616706B1 cover?
An audio analysis system receives a first recording of a speech signal from an origin audio assembly and a second recording of at least a portion of the speech signal from a receiving audio assembly. The speech signal originates from a speaking user of the origin audio assembly and the second recording is recorded by a receiving audio assembly operated by a different user. Both the origin audio…
Who is the assignee on this patent?
Facebook Tech Llc
What technology area does this patent fall under?
Primary CPC classification H04S7/305. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Apr 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).