Low-complexity voice activity detection

US2017133041A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2017133041-A1
Application numberUS-201515321743-A
CountryUS
Kind codeA1
Filing dateJul 7, 2015
Priority dateJul 10, 2014
Publication dateMay 11, 2017
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Many processes for audio signal processing can benefit from voice activity detection, which aims to detect the presence of speech as opposed to silence or noise. The present disclosure describes, among other things, leveraging energy-based features of voice and insights on first and second formant frequencies of vowels to provide a low-complexity and low-power voice activity detector. A pair of two channels is provided whereby each channel is configured to detect voice activity in respective frequency bands of interest. Simultaneous activity detected in both channels can be a sufficient condition for determining that voice is present. More channels or pairs of channels can be used to detect different types of voices to improve detection and/or to detect voices present in different audio streams.

First claim

Opening claim text (preview).

1 . A voice activity detector comprising: a first channel for processing a first audio stream and detecting activity in a first frequency band, wherein the first frequency band includes a first group of formant frequencies characteristic of vowels; a second channel for processing the first audio stream and detecting activity in a second frequency band, wherein the second frequency band includes a second group of formant frequencies characteristic of vowels; and a first decision module for observing the first channel and the second channel to determine whether voice activity is present in the first audio stream. 2 . The voice activity detector of claim 1 , wherein detecting activity in the first frequency band and/or the second frequency band indicates voice activity is present in the first audio stream. 3 . The voice activity detector of claim 1 , wherein detecting activity in both the first frequency band and the second frequency band is sufficient to determine voice activity is present in the first audio stream. 4 . (canceled) 5 . The voice activity detector of claim 1 , wherein: the first frequency band includes a frequency of 400 Hertz; and the second frequency band includes a frequency of 2050 Hertz. 6 - 8 . (canceled) 9 . The voice activity detector of claim 1 , wherein the first channel comprises: a top tracker for tracking the peaks of the estimated energy of the audio stream in the first frequency band to produce an output of the top tracker; a bottom tracker for tracking the quiet periods of the estimated energy of the audio stream in the first frequency band to produce an output of the bottom tracker; and a modulation tracker for subtracting the output of the top tracker and the output of the bottom tracker to generate a modulation index. 10 . The voice activity detector of claim 9 , wherein the top tracker is configured to: decrease the output of the top tracker at a first rate if the estimated energy is no longer at a peak; and decrease the output of the top tracker at a second rate faster than the first rate if the estimated energy has not returned to a peak for a particular period of time. 11 . The voice activity detector of claim 9 , wherein the bottom tracker is configured to: increase the output of the bottom tracker at a first rate if the estimated energy is at a quiet period; and increase the output of the bottom tracker at a second rate faster than the first rate if the estimated energy continued to be in a quiet period for a particular period of time. 12 . The voice activity detector of any claim 9 , wherein the first channel further comprises: a comparator for comparing the modulation index against a threshold; and a low pass filtering module for processing the output of the comparator. 13 . (canceled) 14 . (canceled) 15 . The voice activity detector of claim 1 , further comprises an ambient noise generator configured to artificially generate pre-event audio samples based on the first audio stream. 16 . A voice activity detection apparatus for triggering a process in response to detection of voice activity, comprising: a first voice activity detector including: a first channel for processing a first audio stream and detecting activity in a first frequency band, wherein the first frequency band includes a first group of formant frequencies characteristic of vowels; a second channel for processing the first audio stream and detecting activity in a second frequency band, wherein the second frequency band includes a second group of formant frequencies characteristic of vowels; and a first decision module for observing the first channel and the second channel to determine whether voice activity is present in the first audio stream and generating an output of the first decision module to indicate whether voice activity is present in the first audio stream; wherein the process is triggered in response to the output of the first decision module. 17 . (canceled) 18 . (canceled) 19 . (canceled) 20 . The voice activity detection apparatus of claim 16 , further comprising: a second voice activity detector including: a third channel for processing the first audio stream and detecting activity in an third frequency band, wherein the third frequency band includes a third group of formant frequencies characteristic of vowels; and a second decision module for observing (1) one or more of the first channel and the second channel, and (2) the third channel and generating an output of the second decision module to indicate whether voice activity is present in the first audio stream; wherein the process is triggered in response to the output of the second decision module. 21 . (canceled) 22 . (canceled) 23 . The voice activity detection apparatus of claim 20 , wherein the first group of formant frequencies and the second group of formant frequencies are characteristic of a first type of voice, and the third group of formant frequencies is characteristic of a second type of voice different from the first type of voice. 24 - 27 . (canceled) 28 . The voice activity detection apparatus of claim 16 , further comprising: a second voice activity detector including: a third channel for processing a second audio stream and detecting activity in the first frequency band; a fourth channel for processing the second audio stream and detecting activity in the second frequency band; and a second decision module for observing the third channel and the fourth channel and generating an output of the second decision module to indicate whether voice activity is present in the second audio stream; wherein the process is triggered in response to the output of the second decision module, the first audio stream is generated from a first audio capturing device associated with an electronic system, and the second audio stream is generated from a second audio capturing device associated the same electronic system. 29 - 38 . (canceled) 39 . A method for voice activity detection, the method comprising: processing, in a first channel, a first audio stream and detecting activity in a first frequency band, wherein the first frequency band includes a first group of formant frequencies characteristic of one or more first vowels; processing, in a second channel, the first audio stream and detecting activity in a second frequency band, wherein the second frequency band includes a second group of formant frequencies characteristic of one or more second vowels; observing, by a first decision module, the first channel and the second channel to determine whether voice activity is present in the first audio stream; generating, by the first decision module, an output of the first decision module to indicate whether voice activity is present in the first audio stream; triggering the process in response to the output of the first decision module. 40 - 45 . (canceled) 46 . The method of claim 39 , wherein generating an output of the first decision module comprises: applying an output of the first channel as a gate to an output of a second channel; wherein the gate has a time out which is weighted in time. 47 - 48 . (canceled) 49 . The method of claim 39 , further comprising: in response to detecting activity in the first channel, adjusting a threshold parameter of the second channel.

Assignees

Inventors

Classifications

  • the extracted parameters being formant information · CPC title

  • for discriminating voice from music · CPC title

  • Feature extraction for speech recognition; Selection of recognition unit · CPC title

  • the extracted parameters being spectral information of each sub-band · CPC title

  • G10L25/78Primary

    Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017133041A1 cover?
Many processes for audio signal processing can benefit from voice activity detection, which aims to detect the presence of speech as opposed to silence or noise. The present disclosure describes, among other things, leveraging energy-based features of voice and insights on first and second formant frequencies of vowels to provide a low-complexity and low-power voice activity detector. A pair of…
Who is the assignee on this patent?
Analog Devices Global
What technology area does this patent fall under?
Primary CPC classification G10L25/78. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu May 11 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).