Online source separation
US-9966088-B2 · May 8, 2018 · US
US10901063B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10901063-B2 |
| Application number | US-201816014383-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 21, 2018 |
| Priority date | Dec 22, 2015 |
| Publication date | Jan 26, 2021 |
| Grant date | Jan 26, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The proposed method for localizing a target sound source from a plurality of sound sources, wherein a multi-channel recording signal of the plurality of sound sources comprises a plurality of microphone channel signals, comprises converting each microphone channel signal into a respective channel spectrogram in a time-frequency domain, blindly separating the channel spectrograms to obtain a plurality of separated source signals, identifying, among the plurality of separated source signals, the separated source signal that best matches a target source model, estimating, based on the identified separated source signal, a binary mask reflecting where the target sound source is active in the channel spectrograms in terms of time and frequency, applying the binary mask on the channel spectrograms to obtain masked channel spectrograms, and localizing the target sound source from the plurality of sound sources based on the masked channel spectrograms.
Opening claim text (preview).
What is claimed is: 1. An apparatus for localizing a target sound source from a plurality of sound sources, wherein a multi-channel recording signal of the plurality of sound sources comprises a plurality of microphone channel signals, the apparatus comprising a computing device and a non-transitory computer-readable medium having program code stored thereon, the program code including a plurality of units, the units including: a block segmentation unit adapted to segment the plurality of microphone channel signals into blocks, a converting unit adapted to convert each microphone channel signal of the plurality of microphone channel signals into a respective channel spectrogram in a time-frequency domain by converting each block into the respective channel spectrogram using a short-time Fourier transform, STFT, an averaging unit adapted to average the channel spectrograms into an averaged channel spectrogram, a blind source separation unit adapted to blindly separate the averaged channel spectrogram to obtain a plurality of separated source signals, an identification unit adapted to identify, among the plurality of separated source signals, an identified separated source signal that best matches a target source model, an estimation unit adapted to estimate, based on the identified separated source signal, a binary mask reflecting where the target sound source is active in the channel spectrograms in terms of time and frequency, wherein an element V′ ij of the binary mask is defined as: V ij ′ = { 1 , ψ ij > τ V ij , 0 , otherwise . i = 1 , … N , j = 1 , … F wherein N is a length of the STFT, F is a number of frames comprised in each block, ψ is the identified separated source signal, τ is a threshold, and V is the averaged channel spectrogram, a masking unit adapted to apply the binary mask on the channel spectrograms to obtain masked channel spectrograms, and a localization unit adapted to localize the target sound source from the plurality of sound sources based on the masked channel spectrograms. 2. The apparatus according to claim 1 , wherein the target sound source is considered to be active in a given time-frequency slot of the channel spectrograms if its energy in the given time-frequency slot of the channel spectrograms is sufficiently large in relation to the total energy of the channel spectrograms. 3. The apparatus according to claim 1 , wherein the localization unit is adapted to localize the target sound source by localizing, from the plurality of sound sources, the sound source having a contribution in terms of energy to the plurality of sound sources in the masked channel spectrograms that is similar to the energy relation of the target sound source in relation to the total energy of the plurality of sound sources. 4. The apparatus according to claim 1 , wherein the localization unit comprises: a separation unit adapted to separate the masked channel spectrograms into narrow frequency bands (f i ), and each masked channel spectrogram into corresponding narrow band channel spectrograms (V′ pfi ), a computing unit adapted to, for each narrow frequency band (f i ): compute a covariance (R xxfi ) of a narrow band spectrogram (V′ fi ) comprising the narrow band channel spectrograms (V′ pfi ) of all channels, decompose the covariance (R xxfi ) into its eigenvalues (λ) and eigenvectors (U), compute the energy (E m ) of each eigenvalue, the sum (E) of the energy (E m ) of all eigenvalues, and the corresponding energy ratio (δ′ m ) of each eigenvalue, estimate the averaged signal-to-noise ratio, SNR, (δ fi ) through all frames in said narrow frequency band (f i ) of the averaged spectrogram (V), and choose, from among all eigenvalues λ), the eigenvalue that has the energy ratio (δ′ m ) that is the most similar to the averaged SNR (δ fi ), and an ESPRIT localization unit adapted to localize the target sound source from the plurality of sound sources based on the chosen eigenvalue by means of an ESPRIT algorithm. 5. The apparatus according to claim 4 , wherein the ESPRIT localization unit is adapted to estimate a localization of the target sound source for each narrow frequency band (f i ), wherein the computing unit is adapted to collect the estimated localizations in a histogram and to localize the target sound source by means of a peak detection on the histogram. 6. The apparatus according to claim 1 , wherein the identification unit adapted to identify, among the plurality of separated source signals, the identified separated source signal (ψ) that best matches the target source model comprises: an extraction unit adapted to extract audio features from each of the separated source signals and a sub-identification unit adapted to identify, from among the separated source signals, a separated source signal (Ψ) corresponding to audio features that best match the target source model. 7. The apparatus according to claim 1 , wherein the blind source separation unit is adapted to blindly separate the channel spectrograms to obtain a plurality of separated source signals, by: factorizing the channel spectrograms by means of a non-negative matrix factorization, NMF, into bases (W) and activations (H), the bases (W) and activations (H) corresponding to the separated source signals. 8. The apparatus according to claim 7 , wherein the identification unit adapted to identify, among the plurality of separated source signals, a separated source signal (ψ) that best matches the
microphones · CPC title
determining direction of source · CPC title
for combining the signals of two or more microphones (specially adapted for hearing aids H04R25/407) · CPC title
determining other position line of source · CPC title
using properties of sound source · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.