Method and system for signal transmission control

US9373343B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9373343-B2
Application numberUS-201314382667-A
CountryUS
Kind codeB2
Filing dateMar 21, 2013
Priority dateMar 23, 2012
Publication dateJun 21, 2016
Grant dateJun 21, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An audio signal with a temporal sequence of blocks or frames is received or accessed. Features are determined as characterizing aggregately the sequential audio blocks/frames that have been processed recently, relative to current time. The feature determination exceeds a specificity criterion and is delayed, relative to the recently processed audio blocks/frames. Voice activity indication is detected in the audio signal. VAD is based on a decision that exceeds a preset sensitivity threshold and is computed over a brief time period, relative to blocks/frames duration, and relates to current block/frame features. The VAD and the recent feature determination are combined with state related information, which is based on a history of previous feature determinations that are compiled from multiple features, determined over a time prior to the recent feature determination time period. Decisions to commence or terminate the audio signal, or related gains, are outputted based on the combination.

First claim

Opening claim text (preview).

We claim: 1. A method, comprising: receiving or accessing an audio signal that comprises a plurality of temporally sequential frames; determining two or more features that characterize aggregately two or more of the sequential audio frames that have been processed previously within a time period that is recent in relation to a current point in time, wherein the feature determination exceeds a specificity criterion and is delayed in relation to the recently processed audio frames; detecting an indication of voice activity in the audio signal, wherein the voice activity detection (VAD) is based on a decision that exceeds a preset sensitivity threshold and that is computed over a time period, which is brief in relation to the duration of each of the audio signal frames, and wherein the decision relates to one or more features of a current audio signal frame; combining the high sensitivity short term VAD, the recent high specificity audio frame feature determination and information that relates to a state, which is based on a history of one or more previously computed feature determinations that are compiled from a plurality of features that are determined over a time that is prior to the recent high specificity audio frame feature determination time period; outputting a decision relating to a commencement or termination of the audio signal, or a gain related thereto, based on the combination, wherein said state information includes a nuisance level associated with the audio signal, the nuisance level indicating a possibility that a nuisance state exists at the present frame, wherein the nuisance level is increased with a first rate if the present frame is the last frame of a present voice segment and a voice ratio of the immediately previous frame is less than a nuisance threshold, the voice ratio representing a prediction made at the time of the present frame, about a possibility that the next frame includes voice, and wherein the nuisance level is decreased with a second rate, the second rate faster than the first rate, if the present frame is within the present voice segment, the voice ratio of the present frame is greater than a voice ratio threshold value, and the portion of the present voice segment from its start to the present frame is longer than a time period threshold value; and selectively transmitting the present frame of the audio signal according to the decision. 2. The method as recited in claim 1 wherein the combining step further comprises combining one or more signals or determinations that relate to a feature that comprises a current or previously processed characteristic of the audio signal. 3. The method as recited in claim 1 wherein the state relates to one or more of a nuisance characteristic or a ratio of voice content in the audio signal to a total audio content thereof. 4. The method as recited in claim 1 wherein the combining step further comprises combining information that relates to a far end device or audio condition, which is communicatively coupled with a device that is performing the method. 5. The method as recited in claim 1 , further comprising: analyzing the determined features that characterize the recently processed audio frames; based on the determined features analysis, inferring that the recently processed audio frames contain at least one undesired temporal signal segment; and measuring a nuisance characteristic based on the undesirable signal segment inference. 6. The method as recited in claim 5 wherein the measured nuisance characteristic varies. 7. The method as recited in claim 5 further comprising computing a moving statistic that relates to the desired voice content ratio or prevalence in relation to the undesired temporal signal segment. 8. The method as recited in claim 5 , further comprising: determining one or more features that identify a nuisance characteristic over the aggregate of two or more of the previously processed sequential audio frames; wherein the nuisance measurement is further based on the nuisance feature identification. 9. The method as recited in claim 1 , further comprising: controlling a gain application; and smoothing the desired temporal audio signal segment commencement or termination based on the gain application control. 10. The method as recited in claim 9 wherein: the smoothed desired temporal audio signal segment commencement comprises a fade-in; and the smoothed desired temporal audio signal segment termination comprises a fade-out. 11. The method as recited in claim 3 , inclusive, further comprising controlling a gain level based on the measured nuisance characteristic. 12. An apparatus, comprising: an inputting unit configured to receive or access an audio signal that comprises a plurality of temporally sequential frames; a feature generator configured to determine two or more features that characterize aggregately two or more of the sequential audio frames that have been processed previously within a time period that is recent in relation to a current point in time, wherein the feature determination exceeds a specificity criterion and is delayed in relation to the recently processed audio frames; a detector configured to detect an indication of voice activity in the audio signal, wherein the voice activity detection (VAD) is based on a decision that exceeds a preset sensitivity threshold and that is computed over a time period, which is brief in relation to the duration of each of the audio signal frames, and wherein the decision relates to one or more features of a current audio signal frame; a combining unit configured to combine the high sensitivity short term VAD, the recent high specificity audio frame feature determination and information that relates to a state, which is based on a history of one or more previously computed feature determinations that are compiled from a plurality of features that are determined over a time that is prior to the recent high specificity audio frame feature determination time period; a decision maker configured to output a decision relating to a commencement or termination of the audio signal, or a gain related thereto, based on the combination, wherein said state information includes a nuisance level associated with the audio signal, the nuisance level indicating a possibility that a nuisance state exists at the present frame, wherein the nuisance level is increased with a first rate if the present frame is the last frame of a present voice segment and a voice ratio of the immediately previous frame is less than a nuisance threshold, the voice ratio representing a prediction made at the time of the present frame, about a possibility that the next frame includes voice, and wherein the nuisance level is decreased with a second rate, the second rate faster than the first rate, if the present frame is within the present voice segment, the voice ratio of the present frame is greater than a voice ratio threshold value, and the portion of the present voice segment from its start to the present frame is longer than a time period threshold value; and a transmitter configured to selectively transmit the present frame of the audio signal according to the decision. 13. The apparatus as recited in claim 12 wherein the combining unit is further configured to combine one or more signals or determinations that relate to a feature that comprises a current or previously processed characteristic of the audio signal. 14. The apparatus as recited in claim 12 wherein the state relates to one or more of a nuisance characteristic or a ratio of voice content in the audio signal to a total aud

Assignees

Inventors

Classifications

  • G10L25/78Primary

    Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title

  • based on threshold decision · CPC title

  • G10L25/84Primary

    for discriminating voice from noise · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9373343B2 cover?
An audio signal with a temporal sequence of blocks or frames is received or accessed. Features are determined as characterizing aggregately the sequential audio blocks/frames that have been processed recently, relative to current time. The feature determination exceeds a specificity criterion and is delayed, relative to the recently processed audio blocks/frames. Voice activity indication is de…
Who is the assignee on this patent?
Dolby Lab Licensing Corp
What technology area does this patent fall under?
Primary CPC classification G10L25/78. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 21 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).