Methods and Systems for Automatically Equalizing Audio Output based on Room Position
US-2019103849-A1 · Apr 4, 2019 · US
US10616706B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-10616706-B1 |
| Application number | US-201916544202-A |
| Country | US |
| Kind code | B1 |
| Filing date | Aug 19, 2019 |
| Priority date | Nov 5, 2018 |
| Publication date | Apr 7, 2020 |
| Grant date | Apr 7, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An audio analysis system receives a first recording of a speech signal from an origin audio assembly and a second recording of at least a portion of the speech signal from a receiving audio assembly. The speech signal originates from a speaking user of the origin audio assembly and the second recording is recorded by a receiving audio assembly operated by a different user. Both the origin audio assembly and the receiving audio assembly are located within a room. The audio analysis system selects one or more audio frames in the first recording and one or more audio frames in the second recording that both occur over the same time period. The audio analysis system determines a transfer function for the room based in part on the selected one or more audio frames in the first recording and the selected one or more audio frames in the second recording.
Opening claim text (preview).
What is claimed is: 1. A method comprising: selecting an audio frame of a first recording of a speech signal and an audio frame of a second recording that occur over a same time period and includes at least a portion of the speech signal, wherein an origin audio assembly recorded the first recording of the speech signal and the speech signal originated from a speaking user of the origin audio assembly at a first position in a room and the second recording was recorded at a receiving audio assembly operated by a different user at a second position in the room; and determining a room-impulse response associated with the first position relative to the second position in the room, based in part on the selected audio frame in the first recording and the selected audio frame in the second recording. 2. The method of claim 1 , wherein the audio frame in the first recording and the audio frame in the second recording are selected using cross-correlation analysis of the first recording and the second recording. 3. The method of claim 1 , wherein selecting the audio frame in the first recording and the audio frame in the second recording comprises: determining a decay level for the selected audio frame of the first recording; comparing the decay level for the selected audio frame to a target decay level; and responsive to determining the decay level of the selected audio frame to be below the target decay level, aggregating the selected audio frame with a second audio frame of the first recording. 4. The method of claim 1 , further comprising: determining a threshold number of frames based on a distance between the origin audio assembly and the receiving audio assembly; and responsive to determining that the selected number of frames from each of the first recording and second recording are below the threshold number of frames, determining an intermediate impulse response based on the selected audio frames. 5. The method of claim 1 , wherein the room impulse response is determined by deconvolving the audio frame selected from the first recording with the audio frame selected from the second recording. 6. The method of claim 1 , further comprising one or more of the following: decomposing the room impulse response into octave bands associated with the speech signal to extend the decay of the signal; determining one or more audio parameters based on the room impulse response; and determining one or more acoustic parameters for the room based on the room impulse response. 7. The method of claim 1 , further comprising: determining, for a plurality of positions in the room relative to the second position, a room impulse response associated with each position of the plurality of positions relative to the second position in the room; and determining a transfer function for the room based in part on the room impulse response for each position of the plurality. 8. A device comprising: a microphone array comprising a plurality of acoustic sensors, wherein the microphone array records a first recording of a speech signal originating from a speaking user; and a controller configured to: select an audio frame of a first recording of a speech signal and an audio frame of a second recording that occur over a same time period and includes at least a portion of the speech signal, wherein an origin audio assembly recorded the first recording of the speech signal and the speech signal originated from a speaking user of the origin audio assembly at a first position in a room and the second recording was recorded at a receiving audio assembly operated by a different user at a second position in the room; and determine a room-impulse response associated with the first position relative to the second position in the room, based in part on the selected audio frame in the first recording and the selected audio frame in the second recording. 9. The device of claim 8 , wherein the audio frame in the first recording and the audio frame in the second recording are selected using cross-correlation analysis of the first recording and the second recording. 10. The device of claim 8 , wherein the controller is further configured to: determining a decay level for the selected audio frame of the first recording; comparing the decay level for the selected audio frame to a target decay level; and responsive to determining the decay level of the selected audio frame to be below the target decay level, aggregating the selected audio frame with a second audio frame of the first recording. 11. The device of claim 8 , wherein the controller is further configured to: determine a threshold number of frames based on a distance between the origin audio assembly and the receiving audio assembly; and responsive to determining that the selected number of frames from each of the first recording and second recording are below the threshold number of frames, determine an intermediate impulse response based on the selected audio frames. 12. The device of claim 8 , wherein the room impulse response is determined by deconvolving the audio frame selected from the first recording with the audio frame selected from the second recording. 13. The device of claim 8 , wherein the controller is further configured to: decompose the room impulse response into octave bands associated with the speech signal to extend the decay of the signal; determine one or more audio parameters based on the room impulse response; and determine one or more acoustic parameters for the room based on the room impulse response. 14. The device of claim 8 , wherein the controller is further configured to: determine, for a plurality of positions in the room relative to the second position, a room impulse response associated with each position of the plurality of positions relative to the second position in the room; and determine a transfer function for the room based in part on the room impulse response for each position of the plurality. 15. A non-transitory computer-readable medium configured to store computer-readable instructions that, when executed by a processor, cause the processor to perform steps comprising: selecting an audio frame of a first recording of a speech signal and an audio frame of a second recording that occur over a same time period and includes at least a portion of the speech signal, wherein an origin audio assembly recorded the first recording of the speech signal and the speech signal originated from a speaking user of the origin audio assembly at a first position in a room and the second recording was recorded at a receiving audio assembly operated by a different user at a second position in the room; and determining a room-impulse response associated with the first position relative to the second position in the room, based in part on the selected audio frame in the first recording and the selected audio frame in the second recording. 16. The non-transitory computer-readable medium of claim 15 , wherein the audio frame in the first recording and the audio frame in the second recording are selected using cross-correlation analysis on the first recording and the second recording. 17. The non-transitory computer-readable medium of claim 15 , wherein selecting the audio frame in the first recording and the audio frame in the second recording comprises: determining a decay level for the selected audio frame of the first recording; comparing the decay level for the selected audio frame to a target decay level; and responsive to determining the decay level of the selected audio frame to be below the target decay
Automatic calibration of stereophonic sound system, e.g. with test microphone · CPC title
Electronic adaptation of stereophonic audio signals to reverberation of the listening space (H04S7/301 takes precedence) · CPC title
Synergistic effects of band splitting and sub-band processing · CPC title
Measuring reverberation time {; room acoustic measurements} · CPC title
Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic (H04R2203/12 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.