Active speaker location detection
US-9621795-B1 · Apr 11, 2017 · US
US9986360B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9986360-B1 |
| Application number | US-201715631327-A |
| Country | US |
| Kind code | B1 |
| Filing date | Jun 23, 2017 |
| Priority date | Jun 23, 2017 |
| Publication date | May 29, 2018 |
| Grant date | May 29, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system that automatically calibrates multiple speaker tracking systems with respect to one another based on detection of an active speaker at a collaboration endpoint is presented herein. The system collects a first data point set of an active speaker at the collaboration endpoint using at least a first camera and a first microphone array. The system then receives a plurality of second data point sets from one or more secondary speaker tracking systems located at the collaboration endpoint. Once enough data points have been collected, a reference coordinate system is determined using the first data point set and the one or more second data point sets. Finally, after a reference coordinate system has been determined, the system generates the locations of the one or more secondary speaker tracking systems with respect to the first speaker tracking system.
Opening claim text (preview).
What is claimed is: 1. A method comprising: collecting a first data point set of an active speaker at a collaboration endpoint using at least a first camera and a first microphone array of a first speaker tracking system located at the collaboration endpoint; receiving a plurality of second data point sets from one or more secondary speaker tracking systems located at the collaboration endpoint, each secondary speaker tracking system including at least a secondary camera and a secondary microphone array; determining a reference coordinate system using the first data point set and one or more of the plurality of second data point sets; and generating locations with respect to the reference coordinate system of the one or more secondary speaker tracking systems. 2. The method of claim 1 , wherein collecting the first data point set is performed by the first speaker tracking system. 3. The method of claim 2 , wherein receiving a plurality of second data points, determining a reference coordinate system, and generating locations of the one or more secondary speaker tracking systems is performed by the first speaker tracking system. 4. The method of claim 2 , wherein receiving a plurality of second data points, determining a reference coordinate system, and generating locations of the one or more secondary speaker tracking systems is performed by a server coupled to the first speaker tracking system and the one or more secondary speaker tracking systems. 5. The method of claim 4 , further comprising: receiving, by the server, the first data point set of the active speaker at the collaboration endpoint from the first speaker tracking system. 6. The method of claim 1 , wherein generating the locations of the one or more secondary speaker tracking systems is based on the reference coordinate system, the first data point set, and the plurality of second data point sets. 7. The method of claim 1 , wherein the first data point set includes an indication of a distance of the active speaker from the first speaker tracking system and one or more angles of the active speaker with respect to a normal of the first speaker tracking system at a first point in time. 8. The method of claim 7 , wherein each of the plurality of second data point sets includes an indication of a distance of the active speaker from a respective secondary speaker tracking system and an angle of the active speaker with respect to a normal of the respective secondary speaker tracking system at the first point in time. 9. An apparatus comprising: a network interface unit configured to enable communications over a network; and a processor coupled to the network interface unit, the processor configured to: receive a first data point set associated with an active speaker detected at a collaboration endpoint with at least a first camera and a first microphone array of a first speaker tracking system located at the collaboration endpoint; receive a plurality of second data point sets from one or more secondary speaker tracking systems located at the collaboration endpoint, each secondary speaker tracking system including at least a secondary camera and a secondary microphone array; determine a reference coordinate system using the first data point set and one or more of the plurality of second data point sets; and generate locations with respect to the reference coordinate system of the one or more secondary speaker tracking systems. 10. The apparatus of claim 9 , wherein the processor, when receiving the first data point set, causes the first speaker tracking system to collect the first data point set. 11. The apparatus of claim 10 , wherein the processor, when receiving the plurality of second data points, determining a reference coordinate system, and generating locations of the one or more secondary speaker tracking systems, causes the first speaker tracking system to receive the plurality of second data point sets from the one or more secondary speaker tracking systems located at the collaboration endpoint, determine the reference coordinate system using the first data point set and the one or more second data point sets, and generate the locations of the one or more secondary speaker tracking systems with respect to the first speaker tracking system. 12. The apparatus of claim 9 , wherein the processor is further configured to: receive the first data point set of the active speaker at the collaboration endpoint from the first speaker tracking system. 13. The apparatus of claim 9 , wherein the processor is configured to generate the locations of the one or more secondary speaker tracking systems based on the reference coordinate system, the first data point set, and the plurality of second data point sets. 14. The apparatus of claim 9 , wherein the first data point set includes an indication of a distance of the active speaker from the first speaker tracking system and one or more angles of the active speaker with respect to a normal of the first speaker tracking system at a first point in time. 15. The apparatus of claim 14 , wherein each of the plurality of second data point sets includes an indication of a distance of the active speaker from a respective secondary speaker tracking system and an angle of the active speaker with respect to a normal of the respective secondary speaker tracking system at the first point in time. 16. One or more non-transitory computer readable storage media, the computer readable storage media being encoded with software comprising computer executable instructions, and when the software is executed, operable to: receive a first data point set associated with an active speaker detected at a collaboration endpoint with at least a first camera and a first microphone array of a first speaker tracking system located at the collaboration endpoint; receive a plurality of second data point sets from one or more secondary speaker tracking systems located at the collaboration endpoint, each secondary speaker tracking system including at least a secondary camera and a secondary microphone array; determine a reference coordinate system using the first data point set and one or more of the plurality of second data point sets; and generate locations with respect to the reference coordinate system of the one or more secondary speaker tracking systems. 17. The non-transitory computer readable storage media of claim 16 , wherein the instructions are further operable to: receive the first data point set of the active speaker at the collaboration endpoint from the first speaker tracking system. 18. The non-transitory computer readable storage media of claim 16 , wherein the instructions are configured to generate the locations of the one or more secondary speaker tracking systems based on the reference coordinate system, the first data point set, and the plurality of second data point sets. 19. The non-transitory computer readable storage media of claim 16 , wherein the first data point set includes an indication of a distance of the active speaker from the first speaker tracking system and one or more angles of the active speaker with respect to a normal of the first speaker tracking system at a first point in time. 20. The non-transitory computer readable storage media of claim 19 , wherein each of the plurality of second data point sets includes an indication of a distance of the active speaker from a respective secondary speaker tracking system and an angle of the active speaker with respect to a normal of the respective secondary s
using the instant speaker's algorithm (speech detection per se G10L25/78) · CPC title
Focus control based on electronic image sensor signals · CPC title
audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants (echo suppression in two-way loud-speaking telephone systems H04M9/02; sound field processing per se H04S7/30) · CPC title
Automatic calibration of stereophonic sound system, e.g. with test microphone · CPC title
for loudspeakers (H04R29/007 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.