Interference-free audio pickup in a video conference

US10490202B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10490202-B2
Application numberUS-201816178841-A
CountryUS
Kind codeB2
Filing dateNov 2, 2018
Priority dateJun 30, 2017
Publication dateNov 26, 2019
Grant dateNov 26, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A videoconference apparatus at a first location detects audio from a location and determines whether the sound should be included in an audio-video stream sent to a second location, or excluded as an interfering noise. Determining whether to include the audio involves using a face detector to see if there is a face at the source of the sound. If a face is present, the audio data from the location will be transmitted to the second location. If a face is not present, additional motion checks are performed to determine whether the sound corresponds to a person talking, (such as a presenter at a meeting), or whether the sound is instead unwanted noise.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for providing interference-free audio pickup in a video conference, the method comprising: detecting audio data in an environment, using a plurality of microphones; determining, at a processor, a first location of a source of the audio data, using a beamforming algorithm; detecting first facial data in the environment, using a camera; determining, at the processor, a second location of a source of the first facial data; determining, at a first time, using the processor, that the second location corresponds to the first location, and responsive to determining that the first location corresponds to the second location, including the audio data in an audio stream; checking, at a second time, for second facial data corresponding to the first location; determining, responsive to checking at the second time, a first absence of second facial data; checking, at a third time, for motion at the first location, responsive to determining the first absence of second facial data; determining, responsive to checking at the third time, a presence of motion at the first location; and continuing to include the audio data responsive, at least in part, determining the presence of motion at the first location. 2. The method of claim 1 , wherein detecting the audio data in the environment using the plurality of microphones comprises detecting the audio data in the environment using at least one array of microphones. 3. The method of claim 1 , wherein detecting the audio data in the environment using the plurality of microphones comprises detecting the audio data using a first array of microphones and a second array of microphones, the first array of microphones orthogonal to the second array of microphones. 4. The method of claim 1 , further comprising: checking, at a fourth time, for motion in a region proximate the first location, responsive to determining the first absence of second facial data corresponding to the first location; determining, responsive to checking at the fourth time, an absence of motion in the region proximate the first location; and continuing to include the audio data responsive, at least in part, to determining the absence of motion in the region proximate the first location. 5. The method of claim 4 , wherein the third time and the fourth time are different. 6. The method of claim 1 , wherein detecting first facial data in the environment using the camera comprises detecting a skin tone. 7. The method of claim 1 , wherein checking, at the third time, for motion at the first location, responsive to determining the first absence of second facial data corresponding to the first location comprises checking for one or more of eye lid movement, lip movement, head movement and body movement. 8. The method of claim 1 , wherein determining the first location of the source of the audio data using the processor comprises using a beamforming algorithm. 9. A non-transitory computer readable medium, the non-transitory computer readable medium storing instructions executable by a processor, the instructions comprising instructions to: detect, using a plurality of microphones, audio data in an environment; determine a first location of a source of the audio data; detect, using a camera, first facial data in the environment; determine a second location of a second source of the first facial data; determine at a first time, that the second location corresponds to the first location and in response, include the audio data in an audio stream; check, at a second time, for second facial data corresponding to the first location; determine, in response to checking at the second time, a first absence of second facial data corresponding to the first location; check, in response to determining the first absence of second facial data corresponding to the first location, for motion at the first location at a third time; determine, in responsive to checking at the third time, a presence of motion at the first location; and continue to include the audio data, in response to determining the presence of motion at the first location responsive. 10. The non-transitory computer readable medium of claim 9 , the instructions further comprising instructions to detect audio data in the environment using the plurality of microphones comprise instructions to detect the audio data using a first array of microphones and a second array of microphones, the first array of microphones orthogonal to the second array of microphones. 11. The non-transitory computer readable medium of claim 9 , the instructions further comprising instructions to: check, at a fourth time, for motion in a region proximate the first location, responsive to determining the first absence of second facial data corresponding to the first location; determine, responsive to checking at the fourth time, an absence of motion in the region proximate the first location; and continue to include the audio data responsive, at least in part, to determining the absence of motion in the region proximate the first location. 12. The non-transitory computer readable medium of claim 11 , wherein the third time precedes and the fourth time. 13. The non-transitory computer readable medium of claim 9 , wherein the instructions to detect first facial data in the environment using the camera further comprise instructions to detect a skin tone. 14. The non-transitory computer readable medium of claim 9 , wherein the instructions to check, at the third time, for motion at the first location, responsive to determining the first absence of second facial data corresponding to the first location further comprise instructions to check for at least one of eye lid movement, lip movement, head movement or body movement. 15. The non-transitory computer readable medium of claim 9 , wherein the instructions to determine the first location of the source of the audio data comprise a beamforming algorithm. 16. A video conferencing apparatus, comprising: a processor; a camera coupled to the processor; a plurality of microphones coupled to the processor; and a memory coupled to the processor and storing instructions executable by the processor, the instructions comprising instructions to: detect, using the plurality of microphones, audio data in an environment; determine a first location of a source of the audio data; detect, using the camera, first facial data in the environment; determine a second location of a second source of the first facial data; determine at a first time, that the second location corresponds to the first location and in response, include the audio data in an audio stream; check, at a second time, for second facial data corresponding to the first location; determine, in response to checking at the second time, a first absence of second facial data corresponding to the first location; check, in response to determining the first absence of second facial data corresponding to the first location, for motion at the first location at a third time; determine, in responsive to checking at the third time, a presence of motion at the first location; and continue to include the audio data, in response to determining the presence of motion at the first location responsive. 17. The video conferencing apparatus of claim 16 , wherein the instructions further comprise instructions to: check, at a fourth time, for motion in a region proximate the first location, responsive to determining the first absence of second facial data corresponding to the first location; determine, responsive

Assignees

Inventors

Classifications

  • Control of camera direction for changing a field of view, e.g. pan, tilt or based on tracking of objects · CPC title

  • Control of means for changing angle of the field of view, e.g. optical zoom objectives or electronic zooming · CPC title

  • where the recognised objects include parts of the human body · CPC title

  • Noise filtering · CPC title

  • Tracking of listener position or orientation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10490202B2 cover?
A videoconference apparatus at a first location detects audio from a location and determines whether the sound should be included in an audio-video stream sent to a second location, or excluded as an interfering noise. Determining whether to include the audio involves using a face detector to see if there is a face at the source of the sound. If a face is present, the audio data from the locati…
Who is the assignee on this patent?
Polycom Inc
What technology area does this patent fall under?
Primary CPC classification G10L21/0216. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 26 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).