Detector and method for voice activity detection
US-9773511-B2 · Sep 26, 2017 · US
US2017133041A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2017133041-A1 |
| Application number | US-201515321743-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jul 7, 2015 |
| Priority date | Jul 10, 2014 |
| Publication date | May 11, 2017 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Many processes for audio signal processing can benefit from voice activity detection, which aims to detect the presence of speech as opposed to silence or noise. The present disclosure describes, among other things, leveraging energy-based features of voice and insights on first and second formant frequencies of vowels to provide a low-complexity and low-power voice activity detector. A pair of two channels is provided whereby each channel is configured to detect voice activity in respective frequency bands of interest. Simultaneous activity detected in both channels can be a sufficient condition for determining that voice is present. More channels or pairs of channels can be used to detect different types of voices to improve detection and/or to detect voices present in different audio streams.
Opening claim text (preview).
1 . A voice activity detector comprising: a first channel for processing a first audio stream and detecting activity in a first frequency band, wherein the first frequency band includes a first group of formant frequencies characteristic of vowels; a second channel for processing the first audio stream and detecting activity in a second frequency band, wherein the second frequency band includes a second group of formant frequencies characteristic of vowels; and a first decision module for observing the first channel and the second channel to determine whether voice activity is present in the first audio stream. 2 . The voice activity detector of claim 1 , wherein detecting activity in the first frequency band and/or the second frequency band indicates voice activity is present in the first audio stream. 3 . The voice activity detector of claim 1 , wherein detecting activity in both the first frequency band and the second frequency band is sufficient to determine voice activity is present in the first audio stream. 4 . (canceled) 5 . The voice activity detector of claim 1 , wherein: the first frequency band includes a frequency of 400 Hertz; and the second frequency band includes a frequency of 2050 Hertz. 6 - 8 . (canceled) 9 . The voice activity detector of claim 1 , wherein the first channel comprises: a top tracker for tracking the peaks of the estimated energy of the audio stream in the first frequency band to produce an output of the top tracker; a bottom tracker for tracking the quiet periods of the estimated energy of the audio stream in the first frequency band to produce an output of the bottom tracker; and a modulation tracker for subtracting the output of the top tracker and the output of the bottom tracker to generate a modulation index. 10 . The voice activity detector of claim 9 , wherein the top tracker is configured to: decrease the output of the top tracker at a first rate if the estimated energy is no longer at a peak; and decrease the output of the top tracker at a second rate faster than the first rate if the estimated energy has not returned to a peak for a particular period of time. 11 . The voice activity detector of claim 9 , wherein the bottom tracker is configured to: increase the output of the bottom tracker at a first rate if the estimated energy is at a quiet period; and increase the output of the bottom tracker at a second rate faster than the first rate if the estimated energy continued to be in a quiet period for a particular period of time. 12 . The voice activity detector of any claim 9 , wherein the first channel further comprises: a comparator for comparing the modulation index against a threshold; and a low pass filtering module for processing the output of the comparator. 13 . (canceled) 14 . (canceled) 15 . The voice activity detector of claim 1 , further comprises an ambient noise generator configured to artificially generate pre-event audio samples based on the first audio stream. 16 . A voice activity detection apparatus for triggering a process in response to detection of voice activity, comprising: a first voice activity detector including: a first channel for processing a first audio stream and detecting activity in a first frequency band, wherein the first frequency band includes a first group of formant frequencies characteristic of vowels; a second channel for processing the first audio stream and detecting activity in a second frequency band, wherein the second frequency band includes a second group of formant frequencies characteristic of vowels; and a first decision module for observing the first channel and the second channel to determine whether voice activity is present in the first audio stream and generating an output of the first decision module to indicate whether voice activity is present in the first audio stream; wherein the process is triggered in response to the output of the first decision module. 17 . (canceled) 18 . (canceled) 19 . (canceled) 20 . The voice activity detection apparatus of claim 16 , further comprising: a second voice activity detector including: a third channel for processing the first audio stream and detecting activity in an third frequency band, wherein the third frequency band includes a third group of formant frequencies characteristic of vowels; and a second decision module for observing (1) one or more of the first channel and the second channel, and (2) the third channel and generating an output of the second decision module to indicate whether voice activity is present in the first audio stream; wherein the process is triggered in response to the output of the second decision module. 21 . (canceled) 22 . (canceled) 23 . The voice activity detection apparatus of claim 20 , wherein the first group of formant frequencies and the second group of formant frequencies are characteristic of a first type of voice, and the third group of formant frequencies is characteristic of a second type of voice different from the first type of voice. 24 - 27 . (canceled) 28 . The voice activity detection apparatus of claim 16 , further comprising: a second voice activity detector including: a third channel for processing a second audio stream and detecting activity in the first frequency band; a fourth channel for processing the second audio stream and detecting activity in the second frequency band; and a second decision module for observing the third channel and the fourth channel and generating an output of the second decision module to indicate whether voice activity is present in the second audio stream; wherein the process is triggered in response to the output of the second decision module, the first audio stream is generated from a first audio capturing device associated with an electronic system, and the second audio stream is generated from a second audio capturing device associated the same electronic system. 29 - 38 . (canceled) 39 . A method for voice activity detection, the method comprising: processing, in a first channel, a first audio stream and detecting activity in a first frequency band, wherein the first frequency band includes a first group of formant frequencies characteristic of one or more first vowels; processing, in a second channel, the first audio stream and detecting activity in a second frequency band, wherein the second frequency band includes a second group of formant frequencies characteristic of one or more second vowels; observing, by a first decision module, the first channel and the second channel to determine whether voice activity is present in the first audio stream; generating, by the first decision module, an output of the first decision module to indicate whether voice activity is present in the first audio stream; triggering the process in response to the output of the first decision module. 40 - 45 . (canceled) 46 . The method of claim 39 , wherein generating an output of the first decision module comprises: applying an output of the first channel as a gate to an output of a second channel; wherein the gate has a time out which is weighted in time. 47 - 48 . (canceled) 49 . The method of claim 39 , further comprising: in response to detecting activity in the first channel, adjusting a threshold parameter of the second channel.
the extracted parameters being formant information · CPC title
for discriminating voice from music · CPC title
Feature extraction for speech recognition; Selection of recognition unit · CPC title
the extracted parameters being spectral information of each sub-band · CPC title
Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.