Localization algorithm for sound sources with known statistics

US10901063B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10901063-B2
Application numberUS-201816014383-A
CountryUS
Kind codeB2
Filing dateJun 21, 2018
Priority dateDec 22, 2015
Publication dateJan 26, 2021
Grant dateJan 26, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The proposed method for localizing a target sound source from a plurality of sound sources, wherein a multi-channel recording signal of the plurality of sound sources comprises a plurality of microphone channel signals, comprises converting each microphone channel signal into a respective channel spectrogram in a time-frequency domain, blindly separating the channel spectrograms to obtain a plurality of separated source signals, identifying, among the plurality of separated source signals, the separated source signal that best matches a target source model, estimating, based on the identified separated source signal, a binary mask reflecting where the target sound source is active in the channel spectrograms in terms of time and frequency, applying the binary mask on the channel spectrograms to obtain masked channel spectrograms, and localizing the target sound source from the plurality of sound sources based on the masked channel spectrograms.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus for localizing a target sound source from a plurality of sound sources, wherein a multi-channel recording signal of the plurality of sound sources comprises a plurality of microphone channel signals, the apparatus comprising a computing device and a non-transitory computer-readable medium having program code stored thereon, the program code including a plurality of units, the units including: a block segmentation unit adapted to segment the plurality of microphone channel signals into blocks, a converting unit adapted to convert each microphone channel signal of the plurality of microphone channel signals into a respective channel spectrogram in a time-frequency domain by converting each block into the respective channel spectrogram using a short-time Fourier transform, STFT, an averaging unit adapted to average the channel spectrograms into an averaged channel spectrogram, a blind source separation unit adapted to blindly separate the averaged channel spectrogram to obtain a plurality of separated source signals, an identification unit adapted to identify, among the plurality of separated source signals, an identified separated source signal that best matches a target source model, an estimation unit adapted to estimate, based on the identified separated source signal, a binary mask reflecting where the target sound source is active in the channel spectrograms in terms of time and frequency, wherein an element V′ ij of the binary mask is defined as: V ij ′ = { 1 , ψ ij > τ ⁢ V ij , 0 , otherwise . ⁢ i = 1 , … ⁢ ⁢ N , j = 1 , … ⁢ ⁢ F wherein N is a length of the STFT, F is a number of frames comprised in each block, ψ is the identified separated source signal, τ is a threshold, and V is the averaged channel spectrogram, a masking unit adapted to apply the binary mask on the channel spectrograms to obtain masked channel spectrograms, and a localization unit adapted to localize the target sound source from the plurality of sound sources based on the masked channel spectrograms. 2. The apparatus according to claim 1 , wherein the target sound source is considered to be active in a given time-frequency slot of the channel spectrograms if its energy in the given time-frequency slot of the channel spectrograms is sufficiently large in relation to the total energy of the channel spectrograms. 3. The apparatus according to claim 1 , wherein the localization unit is adapted to localize the target sound source by localizing, from the plurality of sound sources, the sound source having a contribution in terms of energy to the plurality of sound sources in the masked channel spectrograms that is similar to the energy relation of the target sound source in relation to the total energy of the plurality of sound sources. 4. The apparatus according to claim 1 , wherein the localization unit comprises: a separation unit adapted to separate the masked channel spectrograms into narrow frequency bands (f i ), and each masked channel spectrogram into corresponding narrow band channel spectrograms (V′ pfi ), a computing unit adapted to, for each narrow frequency band (f i ): compute a covariance (R xxfi ) of a narrow band spectrogram (V′ fi ) comprising the narrow band channel spectrograms (V′ pfi ) of all channels, decompose the covariance (R xxfi ) into its eigenvalues (λ) and eigenvectors (U), compute the energy (E m ) of each eigenvalue, the sum (E) of the energy (E m ) of all eigenvalues, and the corresponding energy ratio (δ′ m ) of each eigenvalue, estimate the averaged signal-to-noise ratio, SNR, (δ fi ) through all frames in said narrow frequency band (f i ) of the averaged spectrogram (V), and choose, from among all eigenvalues λ), the eigenvalue that has the energy ratio (δ′ m ) that is the most similar to the averaged SNR (δ fi ), and an ESPRIT localization unit adapted to localize the target sound source from the plurality of sound sources based on the chosen eigenvalue by means of an ESPRIT algorithm. 5. The apparatus according to claim 4 , wherein the ESPRIT localization unit is adapted to estimate a localization of the target sound source for each narrow frequency band (f i ), wherein the computing unit is adapted to collect the estimated localizations in a histogram and to localize the target sound source by means of a peak detection on the histogram. 6. The apparatus according to claim 1 , wherein the identification unit adapted to identify, among the plurality of separated source signals, the identified separated source signal (ψ) that best matches the target source model comprises: an extraction unit adapted to extract audio features from each of the separated source signals and a sub-identification unit adapted to identify, from among the separated source signals, a separated source signal (Ψ) corresponding to audio features that best match the target source model. 7. The apparatus according to claim 1 , wherein the blind source separation unit is adapted to blindly separate the channel spectrograms to obtain a plurality of separated source signals, by: factorizing the channel spectrograms by means of a non-negative matrix factorization, NMF, into bases (W) and activations (H), the bases (W) and activations (H) corresponding to the separated source signals. 8. The apparatus according to claim 7 , wherein the identification unit adapted to identify, among the plurality of separated source signals, a separated source signal (ψ) that best matches the

Assignees

Inventors

Classifications

  • microphones · CPC title

  • determining direction of source · CPC title

  • for combining the signals of two or more microphones (specially adapted for hearing aids H04R25/407) · CPC title

  • determining other position line of source · CPC title

  • using properties of sound source · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10901063B2 cover?
The proposed method for localizing a target sound source from a plurality of sound sources, wherein a multi-channel recording signal of the plurality of sound sources comprises a plurality of microphone channel signals, comprises converting each microphone channel signal into a respective channel spectrogram in a time-frequency domain, blindly separating the channel spectrograms to obtain a plu…
Who is the assignee on this patent?
Huawei Tech Duesseldorf Gmbh
What technology area does this patent fall under?
Primary CPC classification G01S3/8006. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 26 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).