Active speaker location detection

US9621795B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9621795-B1
Application numberUS-201614991847-A
CountryUS
Kind codeB1
Filing dateJan 8, 2016
Priority dateJan 8, 2016
Publication dateApr 11, 2017
Grant dateApr 11, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Various examples related to determining a location of an active speaker are provided. In one example, image data of a room from an image capture device is received and a three dimensional model is generated. First audio data from a first microphone array at the image capture device is received. Second audio data from a second microphone array laterally spaced from the image capture device is received. Using the three dimensional model, a location of the second microphone array with respect to the image capture device is determined. Using the audio data and the location and angular orientation of the second microphone array, an estimated location of the active speaker is determined. Using the estimated location, a setting for the image capture device is determined and outputted to highlight the active speaker.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for determining a location of an active speaker, the method comprising: from an image capture device, receiving image data of a room in which the active speaker and at least one inactive speaker are located; using the image data, generating a three dimensional model of at least a portion of the room; from a first microphone array at the image capture device, receiving first audio data from the room; from a second microphone array that is laterally spaced from the image capture device, receiving second audio data from the room; using the three dimensional model, determining a location of the second microphone array with respect to the image capture device; using at least the first audio data, the second audio data, the location of the second microphone array, and an angular orientation of the second microphone array, determining an estimated location in the three dimensional model of the active speaker; using the estimated location of the active speaker to compute a setting for the image capture device; and outputting the setting to control the image capture device to highlight the active speaker. 2. The method of claim 1 , wherein the image capture device comprises a color camera and the image data comprises color image data. 3. The method of claim 1 , wherein the image capture device comprises a depth camera and the image data comprises depth data. 4. The method of claim 1 , wherein the image data comprises signals corresponding to light emitted from a plurality of light sources of the second microphone array, and the method further comprises using the signals to determine the angular orientation of the second microphone array with respect to the image capture device. 5. The method of claim 4 , wherein the plurality of light sources are illuminated in a spatially-recognizable manner. 6. The method of claim 1 , further comprising: receiving a signal from a magnetometer in the second microphone array; and using the magnetometer signal, determining the angular orientation of the second microphone array. 7. The method of claim 1 , further comprising determining that at least one of the first microphone array and the second microphone array has moved; and based on determining that at least one of the first microphone array and the second microphone array has moved, recomputing one or more of the location and the angular orientation of the second microphone array. 8. The method of claim 7 , wherein determining that at least one of the first microphone array and the second microphone array has moved comprises analyzing a signal received from one or more of an accelerometer in the first microphone array, a magnetometer in the first microphone array, an accelerometer in the second microphone array, and a magnetometer in the second microphone array. 9. The method of claim 1 , further comprising: determining that the image data does not comprise image data of a plurality of light sources of the second microphone array; and outputting a notification indicating that the second microphone array is occluded from view of the image capture device. 10. A video conferencing device, comprising: an image capture device for capturing image data of a room in which an active speaker and at least one inactive speaker are located; a first microphone array; a processor; and an active speaker location program executable by the processor, the active speaker location program configured to: using the image data, generate a three dimensional model of at least a portion of the room; receive first audio data of the room from the first microphone array; receive second audio data of the room from a second microphone array that is laterally spaced from the image capture device; using the three dimensional model, determine a location of the second microphone array with respect to the image capture device; using at least the first audio data, the second audio data, the location of the second microphone array, and an angular orientation of the second microphone array, determine an estimated three dimensional location of the active speaker; use the estimated location of the active speaker to compute a setting for the image capture device; and output the setting to control the image capture device to highlight the active speaker. 11. The video conferencing device of claim 10 , wherein the image capture device comprises a color camera and the image data comprises color image data. 12. The video conferencing device of claim 10 , wherein the image capture device comprises a depth camera and the image data comprises depth data. 13. The video conferencing device of claim 10 , wherein the image data comprises signals corresponding to light emitted from a plurality of light sources of the second microphone array, and the active speaker location program is configured to determine the angular orientation of the second microphone array using the signals. 14. The video conferencing device of claim 13 , wherein the plurality of light sources are illuminated in a spatially-recognizable manner. 15. The video conferencing device of claim 10 , wherein the active speaker location program is configured to determine the angular orientation of the second microphone array using a signal received from a magnetometer in the second microphone array. 16. The video conferencing device of claim 10 , wherein the active speaker location program is further configured to: determine that the second microphone array has moved from a first location to a second location; and based on determining that that the second microphone array has moved, recompute one or more of the location and the angular orientation of the second microphone array. 17. The video conferencing device of claim 16 , wherein determining that the second microphone array has moved comprises receiving a signal from an accelerometer in the second microphone array. 18. The video conferencing device of claim 10 , wherein the active speaker location program is further configured to: determine that the image data does not comprise image data of a plurality of light sources of the second microphone array; and output a notification indicating that the second microphone array is occluded from view of the image capture device. 19. A method for determining a location of an active speaker, the method comprising: from an image capture device, receiving image data of a room in which the active speaker and at least one inactive speaker are located; using the image data, generating a three dimensional model of at least a portion of the room; from a first microphone array at the image capture device, receiving first audio data from the room; from a second microphone array that is laterally spaced from the image capture device, receiving second audio data from the room; using the three dimensional model, determining a location of the second microphone array with respect to the image capture device; determining an angular orientation of the second microphone array with respect to the image capture device by receiving light emitted from a plurality of light sources of the second microphone array; using at least the first audio data, the second audio data, the location of the second microphone array, and the angular orientation of the second microphone array, determining an estimated three dimensional location of the active speaker; using the estimated location of the active speaker to compute a setting for the image capture device; and outputting the setting to control the im

Assignees

Inventors

Classifications

  • Control of means for changing angle of the field of view, e.g. optical zoom objectives or electronic zooming · CPC title

  • where the recognised objects include parts of the human body · CPC title

  • Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic (H04R2203/12 takes precedence) · CPC title

  • Electricity · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9621795B1 cover?
Various examples related to determining a location of an active speaker are provided. In one example, image data of a room from an image capture device is received and a three dimensional model is generated. First audio data from a first microphone array at the image capture device is received. Second audio data from a second microphone array laterally spaced from the image capture device is re…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification H04N5/23219. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Apr 11 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).