Audio signal section estimating apparatus, audio signal section estimating method, and recording medium

US9208780B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9208780-B2
Application numberUS-201013384917-A
CountryUS
Kind codeB2
Filing dateJul 15, 2010
Priority dateJul 21, 2009
Publication dateDec 8, 2015
Grant dateDec 8, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The processing efficiency and estimation accuracy of a voice activity detection apparatus are improved. An acoustic signal analyzer receives a digital acoustic signal containing a speech signal and a noise signal, generates a non-speech GMM and a speech GMM adapted to a noise environment, by using a silence GMM and a clean-speech GMM in each frame of the digital acoustic signal, and calculates the output probabilities of dominant Gaussian distributions of the GMMs. A speech state probability to non-speech state probability ratio calculator calculates a speech state probability to non-speech state probability ratio based on a state transition model of a speech state and a non-speech state, by using the output probabilities; and a voice activity detection unit judges, from the speech state probability to non-speech state probability ratio, whether the acoustic signal in the frame is in the speech state or in the non-speech state and outputs only the acoustic signal in the speech state.

First claim

Opening claim text (preview).

What is claimed is: 1. A voice activity detection apparatus comprising: an acoustic signal analyzer that receives a digital acoustic signal containing a speech signal and a noise signal; generates a non-speech Gaussian mixture model, a Gaussian mixture model being hereafter referred to as a GMM, and a speech GMM both adapted to a noise environment, by using a silence GMM and a clean-speech GMM both generated beforehand for each frame of the digital acoustic signal; and calculates non-speech probabilities and speech probabilities of Gaussian distributions left after one or more Gaussian distributions having the smallest output probability are pruned from the GMMs; and a speech detection information generator that calculates a speech state probability to non-speech state probability ratio based on a state transition model of a speech state and a non-speech state, by using the non-speech probabilities and the speech probabilities, generates information about a speech period based on the calculated probability ratio, and outputs the information as speech detection information, wherein the acoustic signal analyzer comprises: an initial noise probabilistic model estimation processor that estimates initial noise probabilistic model parameters; a parameter prediction processor that predicts noise probabilistic model parameters of the current frame from estimated noise probabilistic model parameters of a preceding frame by a random walk process; a parameter update processor that receives the noise probabilistic model parameters of the current frame and updates parameters of all Gaussian distributions contained in the silence GMM and the clean-speech GMM; a probabilistic model parameter generation and estimation processor that generates a non-speech GMM and a speech GMM adapted to the noise environment in the current frame by using the updated parameters of the Gaussian distributions and parameters of various Gaussian distributions of the silence GMM and the clean-speech GMM; an output probability calculation processor that calculates the output probability of each Gaussian distribution contained in the generated GMMs; a probability weight calculation processor that calculates probability weights used for weighting the output probabilities of the Gaussian distributions in the non-speech state and the speech state, by parameterizing the distribution of the output probabilities of the Gaussian distributions with a higher-order statistic; a dominant distribution determination processor that prunes Gaussian distributions having an extremely small output probability and extracts only Gaussian distributions having a sufficiently large output probability; a first weighted average processor that obtains a weighted average of the noise probabilistic model parameters of the current frame predicted by the parameter prediction processor, by using the probability weights calculated by the probability weight calculation processor; and a second weighted average processor that obtains a weighted average of noise probabilistic model parameters subjected to weighted averaging by the first weighted average processor, only for the Gaussian distributions extracted by the dominant distribution determination processor. 2. The voice activity detection apparatus according to claim 1 , wherein the acoustic signal analyzer comprises a probability weight calculation processor that calculates the degrees of scatter of the non-speech probabilities and the speech probabilities and calculates probability weights used for correcting the non-speech probabilities and the speech probabilities such that the output probabilities of the Gaussian distributions increase as the degrees of scatter decrease. 3. The voice activity detection apparatus according to claim 1 , wherein the acoustic signal analyzer comprises a dominant distribution determination processor that calculates a cumulative sum of the output probabilities in descending order and determines a Gaussian distribution whose output probability gives a cumulative sum exceeding a predetermined level, as the one or more Gaussian distributions having the smallest output probability to be pruned. 4. The voice activity detection apparatus according to claim 1 , further comprising: a signal averaging unit that averages out the digital acoustic signals of various channels in each frame; and a second acoustic signal analyzer that obtains a speech probability and a non-speech probability by using a periodic component power and an aperiodic component power; wherein the speech detection information generator multiplies a speech probability and a non-speech probability calculated by the acoustic signal analyzer by the speech probability and the non-speech probability obtained by the second acoustic signal analyzer respectively, and calculates the speech state probability to non-speech state probability ratio by using the results of multiplication. 5. The voice activity detection apparatus according to one of claims 1 , 2 , 3 , and 4 , wherein the speech detection information generator comprises: a speech state probability to non-speech state probability ratio calculator that calculates the speech state probability to non-speech state probability ratio; and a voice activity detection unit that judges, from the speech state probability to non-speech state probability ratio, whether the acoustic signal of the frame is in the speech state or in the non-speech state and generates the speech detection information based on the judgment result. 6. The voice activity detection apparatus according to one of claims 1 , 2 , 3 , and 4 , further comprising a noise suppressor that receives the probability ratio calculated by the speech detection information generator and the output probabilities calculated by the acoustic signal analyzer, generates a noise suppression filter, and suppresses noise in the digital acoustic signal. 7. A voice activity detection method comprising: an acoustic signal analysis step that receives a digital acoustic signal containing a speech signal and a noise signal; generates probabilistic models of a non-speech Gaussian mixture model, a Gaussian mixture model being hereafter referred to as a GMM, and a speech GMM both adapted to a noise environment, by using a silence GMM and a clean-speech GMM both generated beforehand for each frame of the digital acoustic signal; and calculates non-speech probabilities and speech probabilities of Gaussian distributions left after one or more Gaussian distributions having the smallest output probability are pruned from the GMMs; and a speech detection information generation step that calculates a probability ratio based on a state transition model of a speech state and a non-speech, by using the non-speech probabilities and the speech probabilities, generates information about a speech period based on the calculated probability ratio, and outputs the information as speech detection information, wherein the acoustic signal analysis step comprises: an initial noise probabilistic model estimation step of estimating initial noise probabilistic model parameters; a parameter prediction step of predicting noise probabilistic model parameters of the current frame from estimated noise probabilistic model parameters of a preceding frame by a random walk process; a parameter update step of receiving the noise probabilistic model parameters of the current frame and updating parameters of all Gaussian distributions contained in the silence GMM and clean-speech GMM; a probabilistic model parameter generation and estimation step of generating a non-speech GMM and a speech GMM adapted to the noise environment in the current frame by using the updated parameters of the Gaussian distributions and parameters of various Gaussian

Assignees

Inventors

Classifications

  • Speech enhancement, e.g. noise reduction or echo cancellation (reducing echo effects in line transmission systems H04B3/20; echo suppression in hands-free telephones H04M9/08) · CPC title

  • G10L15/20Primary

    Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title

  • using statistical models, e.g. Hidden Markov Models [HMMs] (G10L15/18 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9208780B2 cover?
The processing efficiency and estimation accuracy of a voice activity detection apparatus are improved. An acoustic signal analyzer receives a digital acoustic signal containing a speech signal and a noise signal, generates a non-speech GMM and a speech GMM adapted to a noise environment, by using a silence GMM and a clean-speech GMM in each frame of the digital acoustic signal, and calculates …
Who is the assignee on this patent?
Fujimoto Masakiyo, Nakatani Tomohiro, Nippon Telegraph & Telephone
What technology area does this patent fall under?
Primary CPC classification G10L15/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 08 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).