Theme detection for object-recognition-based notifications
US-12183330-B2 · Dec 31, 2024 · US
US9373343B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9373343-B2 |
| Application number | US-201314382667-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 21, 2013 |
| Priority date | Mar 23, 2012 |
| Publication date | Jun 21, 2016 |
| Grant date | Jun 21, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An audio signal with a temporal sequence of blocks or frames is received or accessed. Features are determined as characterizing aggregately the sequential audio blocks/frames that have been processed recently, relative to current time. The feature determination exceeds a specificity criterion and is delayed, relative to the recently processed audio blocks/frames. Voice activity indication is detected in the audio signal. VAD is based on a decision that exceeds a preset sensitivity threshold and is computed over a brief time period, relative to blocks/frames duration, and relates to current block/frame features. The VAD and the recent feature determination are combined with state related information, which is based on a history of previous feature determinations that are compiled from multiple features, determined over a time prior to the recent feature determination time period. Decisions to commence or terminate the audio signal, or related gains, are outputted based on the combination.
Opening claim text (preview).
We claim: 1. A method, comprising: receiving or accessing an audio signal that comprises a plurality of temporally sequential frames; determining two or more features that characterize aggregately two or more of the sequential audio frames that have been processed previously within a time period that is recent in relation to a current point in time, wherein the feature determination exceeds a specificity criterion and is delayed in relation to the recently processed audio frames; detecting an indication of voice activity in the audio signal, wherein the voice activity detection (VAD) is based on a decision that exceeds a preset sensitivity threshold and that is computed over a time period, which is brief in relation to the duration of each of the audio signal frames, and wherein the decision relates to one or more features of a current audio signal frame; combining the high sensitivity short term VAD, the recent high specificity audio frame feature determination and information that relates to a state, which is based on a history of one or more previously computed feature determinations that are compiled from a plurality of features that are determined over a time that is prior to the recent high specificity audio frame feature determination time period; outputting a decision relating to a commencement or termination of the audio signal, or a gain related thereto, based on the combination, wherein said state information includes a nuisance level associated with the audio signal, the nuisance level indicating a possibility that a nuisance state exists at the present frame, wherein the nuisance level is increased with a first rate if the present frame is the last frame of a present voice segment and a voice ratio of the immediately previous frame is less than a nuisance threshold, the voice ratio representing a prediction made at the time of the present frame, about a possibility that the next frame includes voice, and wherein the nuisance level is decreased with a second rate, the second rate faster than the first rate, if the present frame is within the present voice segment, the voice ratio of the present frame is greater than a voice ratio threshold value, and the portion of the present voice segment from its start to the present frame is longer than a time period threshold value; and selectively transmitting the present frame of the audio signal according to the decision. 2. The method as recited in claim 1 wherein the combining step further comprises combining one or more signals or determinations that relate to a feature that comprises a current or previously processed characteristic of the audio signal. 3. The method as recited in claim 1 wherein the state relates to one or more of a nuisance characteristic or a ratio of voice content in the audio signal to a total audio content thereof. 4. The method as recited in claim 1 wherein the combining step further comprises combining information that relates to a far end device or audio condition, which is communicatively coupled with a device that is performing the method. 5. The method as recited in claim 1 , further comprising: analyzing the determined features that characterize the recently processed audio frames; based on the determined features analysis, inferring that the recently processed audio frames contain at least one undesired temporal signal segment; and measuring a nuisance characteristic based on the undesirable signal segment inference. 6. The method as recited in claim 5 wherein the measured nuisance characteristic varies. 7. The method as recited in claim 5 further comprising computing a moving statistic that relates to the desired voice content ratio or prevalence in relation to the undesired temporal signal segment. 8. The method as recited in claim 5 , further comprising: determining one or more features that identify a nuisance characteristic over the aggregate of two or more of the previously processed sequential audio frames; wherein the nuisance measurement is further based on the nuisance feature identification. 9. The method as recited in claim 1 , further comprising: controlling a gain application; and smoothing the desired temporal audio signal segment commencement or termination based on the gain application control. 10. The method as recited in claim 9 wherein: the smoothed desired temporal audio signal segment commencement comprises a fade-in; and the smoothed desired temporal audio signal segment termination comprises a fade-out. 11. The method as recited in claim 3 , inclusive, further comprising controlling a gain level based on the measured nuisance characteristic. 12. An apparatus, comprising: an inputting unit configured to receive or access an audio signal that comprises a plurality of temporally sequential frames; a feature generator configured to determine two or more features that characterize aggregately two or more of the sequential audio frames that have been processed previously within a time period that is recent in relation to a current point in time, wherein the feature determination exceeds a specificity criterion and is delayed in relation to the recently processed audio frames; a detector configured to detect an indication of voice activity in the audio signal, wherein the voice activity detection (VAD) is based on a decision that exceeds a preset sensitivity threshold and that is computed over a time period, which is brief in relation to the duration of each of the audio signal frames, and wherein the decision relates to one or more features of a current audio signal frame; a combining unit configured to combine the high sensitivity short term VAD, the recent high specificity audio frame feature determination and information that relates to a state, which is based on a history of one or more previously computed feature determinations that are compiled from a plurality of features that are determined over a time that is prior to the recent high specificity audio frame feature determination time period; a decision maker configured to output a decision relating to a commencement or termination of the audio signal, or a gain related thereto, based on the combination, wherein said state information includes a nuisance level associated with the audio signal, the nuisance level indicating a possibility that a nuisance state exists at the present frame, wherein the nuisance level is increased with a first rate if the present frame is the last frame of a present voice segment and a voice ratio of the immediately previous frame is less than a nuisance threshold, the voice ratio representing a prediction made at the time of the present frame, about a possibility that the next frame includes voice, and wherein the nuisance level is decreased with a second rate, the second rate faster than the first rate, if the present frame is within the present voice segment, the voice ratio of the present frame is greater than a voice ratio threshold value, and the portion of the present voice segment from its start to the present frame is longer than a time period threshold value; and a transmitter configured to selectively transmit the present frame of the audio signal according to the decision. 13. The apparatus as recited in claim 12 wherein the combining unit is further configured to combine one or more signals or determinations that relate to a feature that comprises a current or previously processed characteristic of the audio signal. 14. The apparatus as recited in claim 12 wherein the state relates to one or more of a nuisance characteristic or a ratio of voice content in the audio signal to a total aud
Related publications grouped by family.
Answers are generated from the same data shown on this page.