Interactive viewer for image stacks
US-9195880-B1 · Nov 24, 2015 · US
US10490202B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10490202-B2 |
| Application number | US-201816178841-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 2, 2018 |
| Priority date | Jun 30, 2017 |
| Publication date | Nov 26, 2019 |
| Grant date | Nov 26, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A videoconference apparatus at a first location detects audio from a location and determines whether the sound should be included in an audio-video stream sent to a second location, or excluded as an interfering noise. Determining whether to include the audio involves using a face detector to see if there is a face at the source of the sound. If a face is present, the audio data from the location will be transmitted to the second location. If a face is not present, additional motion checks are performed to determine whether the sound corresponds to a person talking, (such as a presenter at a meeting), or whether the sound is instead unwanted noise.
Opening claim text (preview).
The invention claimed is: 1. A method for providing interference-free audio pickup in a video conference, the method comprising: detecting audio data in an environment, using a plurality of microphones; determining, at a processor, a first location of a source of the audio data, using a beamforming algorithm; detecting first facial data in the environment, using a camera; determining, at the processor, a second location of a source of the first facial data; determining, at a first time, using the processor, that the second location corresponds to the first location, and responsive to determining that the first location corresponds to the second location, including the audio data in an audio stream; checking, at a second time, for second facial data corresponding to the first location; determining, responsive to checking at the second time, a first absence of second facial data; checking, at a third time, for motion at the first location, responsive to determining the first absence of second facial data; determining, responsive to checking at the third time, a presence of motion at the first location; and continuing to include the audio data responsive, at least in part, determining the presence of motion at the first location. 2. The method of claim 1 , wherein detecting the audio data in the environment using the plurality of microphones comprises detecting the audio data in the environment using at least one array of microphones. 3. The method of claim 1 , wherein detecting the audio data in the environment using the plurality of microphones comprises detecting the audio data using a first array of microphones and a second array of microphones, the first array of microphones orthogonal to the second array of microphones. 4. The method of claim 1 , further comprising: checking, at a fourth time, for motion in a region proximate the first location, responsive to determining the first absence of second facial data corresponding to the first location; determining, responsive to checking at the fourth time, an absence of motion in the region proximate the first location; and continuing to include the audio data responsive, at least in part, to determining the absence of motion in the region proximate the first location. 5. The method of claim 4 , wherein the third time and the fourth time are different. 6. The method of claim 1 , wherein detecting first facial data in the environment using the camera comprises detecting a skin tone. 7. The method of claim 1 , wherein checking, at the third time, for motion at the first location, responsive to determining the first absence of second facial data corresponding to the first location comprises checking for one or more of eye lid movement, lip movement, head movement and body movement. 8. The method of claim 1 , wherein determining the first location of the source of the audio data using the processor comprises using a beamforming algorithm. 9. A non-transitory computer readable medium, the non-transitory computer readable medium storing instructions executable by a processor, the instructions comprising instructions to: detect, using a plurality of microphones, audio data in an environment; determine a first location of a source of the audio data; detect, using a camera, first facial data in the environment; determine a second location of a second source of the first facial data; determine at a first time, that the second location corresponds to the first location and in response, include the audio data in an audio stream; check, at a second time, for second facial data corresponding to the first location; determine, in response to checking at the second time, a first absence of second facial data corresponding to the first location; check, in response to determining the first absence of second facial data corresponding to the first location, for motion at the first location at a third time; determine, in responsive to checking at the third time, a presence of motion at the first location; and continue to include the audio data, in response to determining the presence of motion at the first location responsive. 10. The non-transitory computer readable medium of claim 9 , the instructions further comprising instructions to detect audio data in the environment using the plurality of microphones comprise instructions to detect the audio data using a first array of microphones and a second array of microphones, the first array of microphones orthogonal to the second array of microphones. 11. The non-transitory computer readable medium of claim 9 , the instructions further comprising instructions to: check, at a fourth time, for motion in a region proximate the first location, responsive to determining the first absence of second facial data corresponding to the first location; determine, responsive to checking at the fourth time, an absence of motion in the region proximate the first location; and continue to include the audio data responsive, at least in part, to determining the absence of motion in the region proximate the first location. 12. The non-transitory computer readable medium of claim 11 , wherein the third time precedes and the fourth time. 13. The non-transitory computer readable medium of claim 9 , wherein the instructions to detect first facial data in the environment using the camera further comprise instructions to detect a skin tone. 14. The non-transitory computer readable medium of claim 9 , wherein the instructions to check, at the third time, for motion at the first location, responsive to determining the first absence of second facial data corresponding to the first location further comprise instructions to check for at least one of eye lid movement, lip movement, head movement or body movement. 15. The non-transitory computer readable medium of claim 9 , wherein the instructions to determine the first location of the source of the audio data comprise a beamforming algorithm. 16. A video conferencing apparatus, comprising: a processor; a camera coupled to the processor; a plurality of microphones coupled to the processor; and a memory coupled to the processor and storing instructions executable by the processor, the instructions comprising instructions to: detect, using the plurality of microphones, audio data in an environment; determine a first location of a source of the audio data; detect, using the camera, first facial data in the environment; determine a second location of a second source of the first facial data; determine at a first time, that the second location corresponds to the first location and in response, include the audio data in an audio stream; check, at a second time, for second facial data corresponding to the first location; determine, in response to checking at the second time, a first absence of second facial data corresponding to the first location; check, in response to determining the first absence of second facial data corresponding to the first location, for motion at the first location at a third time; determine, in responsive to checking at the third time, a presence of motion at the first location; and continue to include the audio data, in response to determining the presence of motion at the first location responsive. 17. The video conferencing apparatus of claim 16 , wherein the instructions further comprise instructions to: check, at a fourth time, for motion in a region proximate the first location, responsive to determining the first absence of second facial data corresponding to the first location; determine, responsive
Control of camera direction for changing a field of view, e.g. pan, tilt or based on tracking of objects · CPC title
Control of means for changing angle of the field of view, e.g. optical zoom objectives or electronic zooming · CPC title
where the recognised objects include parts of the human body · CPC title
Noise filtering · CPC title
Tracking of listener position or orientation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.