Identifying sound from a source of interest based on multiple audio feeds

US9691413B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9691413-B2
Application numberUS-201514876666-A
CountryUS
Kind codeB2
Filing dateOct 6, 2015
Priority dateOct 6, 2015
Publication dateJun 27, 2017
Grant dateJun 27, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and systems for identifying sound from a source of interest are provided for herein. In some embodiments, a first audio feed is captured by a first microphone and a second audio feed is captured by a second microphone. The first microphone may be located closer in proximity to the source of interest than the second microphone. The first audio feed can be processed utilizing the second audio feed to produce a first processed audio feed that can enable identification of sound originating from the source of interest. In some embodiments, the second audio feed can be additionally processed utilizing the first audio feed to produce a second processed audio feed. In such embodiments, frequencies from the first processed audio feed can be compared against frequencies of the second processed audio feed to identify sound originating from the source of interest. Other embodiments may be described and/or claimed herein.

First claim

Opening claim text (preview).

What is claimed is: 1. A sound processing system comprising: a first audio capture device and a second audio capture device, wherein the first audio capture device is located in closer proximity to a point of interest than the second audio capture device; a voice activity detection module to: receive first and second audio feeds respectively captured by the first and second audio capture devices; attenuate at least a portion of the first audio feed based on a corresponding portion of the second audio feed to generate a first attenuated audio feed; attenuate at least a portion of the second audio feed based on a corresponding portion of the first audio feed to generate a second attenuated audio feed; compare frequency bands of the first attenuated audio feed with corresponding frequency bands of the second attenuated audio feed; and determine a source confidence level based on a number of the frequency bands from the first attenuated audio feed that exceed a predefined threshold of difference from the corresponding frequency bands of the second attenuated audio feed, wherein the source confidence level is indicative of whether sound is originating from the point of interest. 2. The sound processing system of claim 1 , wherein a higher value for the source confidence level is more indicative of sound within the first attenuated audio feed originating from the point of interest than a lower value for the source confidence level. 3. The sound processing system of claim 1 , wherein to attenuate at least the portion of the first audio feed based on the corresponding portion of the second audio feed is to attenuate one or more frequencies contained within the first audio feed that are contained within the second audio feed, and wherein to attenuate at least the portion of the second audio feed based on the corresponding portion of the first audio feed is to attenuate one or more frequencies contained within the second audio feed that are contained within the first audio feed. 4. The sound processing system of claim 1 , wherein the voice activity detection module is further to: time synchronize the first audio feed with the second audio feed prior to attenuating at least the portion of the first audio feed; and time synchronize the second audio feed with the first audio feed prior to attenuating at least the portion of the second audio feed. 5. The sound processing system of claim 1 , wherein to time synchronize the first audio feed with the second audio feed is to apply a first delay to the first audio feed, the first delay reflecting the amount of time it takes for sound to travel from the first audio capture device to the second audio capture device, and wherein to time synchronize the second audio feed with the first audio feed is to apply a second delay to the second audio feed, the second delay reflecting the amount of time it takes for sound to travel from the second audio capture device to the first audio capture device. 6. The sound processing system of claim 1 , further comprising: a voice recognition module to: receive the first attenuated audio feed; monitor the first attenuated audio feed to identify one or more triggers contained within the first attenuated audio feed; and cause one or more actions to occur in response to identifying the one or more triggers. 7. The sound processing system of claim 6 , wherein the voice activity detection module is further to: output the first attenuated audio feed to the voice recognition engine in response to a determination that the source confidence level exceeds a preconfigured limit. 8. The sound processing system of claim 7 , wherein the preconfigured limit varies based upon a power level of a computing device that hosts the sound processing system. 9. The sound processing system of claim 1 , wherein the voice activity detection module is further to: determine a noise confidence level based on a number of the frequency bands from the first audio feed that are within a predefined threshold of difference from the corresponding frequency bands of the second audio feed, wherein a higher value for the noise confidence level is more indicative of sound within the first audio feed being noise than a lower value for the noise confidence level. 10. The sound processing system of claim 1 , further comprising an acoustic echo cancellation (AEC) module that is to: reduce an amount of echo contained within the first attenuated audio feed. 11. One or more computer storage hardware media device having computer-executable instructions embodied thereon that, when executed, by one or more processors of a computing device, causes the one or more processors to: perform a method for processing sound, the method comprising: filtering a first audio feed utilizing a second audio feed to produce a filtered audio feed, wherein the first audio feed is captured by a first microphone and the second audio feed is captured by a second microphone, the first microphone being closer in proximity to an audio source of interest than the second microphone; and identifying whether the first audio feed contains sound originating from a direction of the source of interest based on frequencies contained within the filtered audio feed. 12. The one or more computer storage media of claim 11 , wherein the filtered audio feed is a first filtered audio feed the method further comprising: filtering the second audio feed utilizing the first audio feed to produce a second filtered audio feed, wherein identifying whether the first audio feed contains sound originating from the direction of the source of interest includes comparing frequency bands of the first filtered audio feed with corresponding frequency bands of the second filtered audio feed; and determining a source confidence level based on a number of the frequency bands from the first filtered audio feed that exceed a predefined threshold of difference from the corresponding frequency bands of the second filtered audio feed. 13. The one or more computer storage media of claim 12 , the method further comprising sending the filtered audio feed to a voice recognition engine of the computing device in response to the source confidence level exceeding a preconfigured limit. 14. The one or more computer storage media of claim 13 , wherein the preconfigured limit varies based upon a power level of the computing device. 15. The one or more computer storage media of claim 12 , wherein filtering the first audio feed utilizing the second audio feed further comprises filtering frequencies from the first audio feed that are contained within the second audio feed, and wherein filtering the second audio feed utilizing the first audio feed further comprises filtering frequencies from the second audio feed that are contained within the first audio feed. 16. A computer-implemented method for voice activity detection comprising: receiving a first audio feed captured by a first microphone of a computing device and a second audio feed captured by a second microphone of the computing device, wherein the first microphone is closer in proximity to a source of interest than the second microphone; and processing the first audio feed utilizing the second audio feed to enable identification of sound originating from a direction of the source of interest. 17. The computer-implemented method of claim 16 , wherein processing the first audio feed utilizing the second audio feed comprises: filtering frequencies of the first audio feed based on corresponding frequencies of the second audio feed to produce a filtered audio feed.

Assignees

Inventors

Classifications

  • Automatic adjustment · CPC title

  • Word spotting · CPC title

  • Details of processing therefor · CPC title

  • for synchronising with other signals, e.g. video signals · CPC title

  • Execution procedure of a spoken command · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9691413B2 cover?
Methods and systems for identifying sound from a source of interest are provided for herein. In some embodiments, a first audio feed is captured by a first microphone and a second audio feed is captured by a second microphone. The first microphone may be located closer in proximity to the source of interest than the second microphone. The first audio feed can be processed utilizing the second a…
Who is the assignee on this patent?
Zad Issa Syavosh, Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G10L25/78. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 27 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).