What technology area does this patent fall under?

Primary CPC classification H04N7/15. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Oct 24 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

System and method of speaker reidentification in a multiple camera setting conference room

US11800057B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11800057-B2
Application number	US-202117646704-A
Country	US
Kind code	B2
Filing date	Dec 31, 2021
Priority date	Dec 31, 2021
Publication date	Oct 24, 2023
Grant date	Oct 24, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In a multi-camera videoconferencing configuration, the locations of each camera are known. By referencing a known object visible to each camera, a 3D coordinate system is developed, with the position and angle of each camera being associated with that 3D coordinate system. The locations of the conference participants in the 3D coordinate system are determined for each camera. Sound source localization (SSL) from one camera, generally a central camera, is used to determine the speaker. The pose of the speaker is then determined. From the pose and the known locations of the cameras, the camera with the best frontal view of the speaker is determined. The 3D coordinates of the speaker are then used to direct the determined camera to frame the speaker. If the face of the speaker is not sufficiently visible, the next best camera view is determined, and the speaker framed from that camera view.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for selecting a camera of a plurality of cameras, each with a different view of a group of participants in an environment and providing a video stream, one camera of the plurality of cameras having a microphone array, to provide a video stream for provision to a far end, the method comprising: determining world coordinates of each participant for each camera of the plurality of cameras; utilizing sound source localization using the microphone array on the one camera to determine speaker direction information; identifying a speaker in the group of participants using the speaker direction information and an image from the video stream of the one camera; determining world coordinates of the speaker based on the identification; determining facial pose of the speaker in the image from the video stream of the one camera; selecting a camera from the plurality of cameras to provide a video stream for provision to the far end based on the locations of the plurality of cameras other than the one camera and the facial pose of the speaker; and utilizing the determined speaker world coordinates to frame the speaker in the video stream of the selected camera. 2. The method of claim 1 , further comprising: determining the rotation and translation of a coordinate system of each of the plurality of cameras to the world coordinate system. 3. The method of claim 1 , further comprising: selecting the camera of the plurality of cameras providing the most frontal views of participants when there is not a speaker and there are participants; and selecting a default camera when there are no participants. 4. The method of claim 1 , wherein determining the world coordinates of each participant includes storing the determined world coordinates of each participant in a table of cameras and individuals from the perspective of the camera, and wherein utilizing the determined speaker world coordinates to frame the speaker includes using the determined speaker world coordinates to find the appropriate individual for the selected camera from the table. 5. The method of claim 1 , further comprising: determining if the frontal view of the speaker provided from the selected camera is satisfactory; and providing a framed view of the speaker from the selected camera when the frontal view of the speaker provided from the selected camera is satisfactory. 6. The method of claim 5 , further comprising: utilizing the determined speaker world coordinates to evaluate the facial view of the speaker from each camera of the plurality of cameras other than the selected camera when the frontal view of the speaker provided from the selected camera is not satisfactory; and providing a framed view of the speaker from the camera of the plurality of cameras that has the best frontal view of the speaker when the frontal view of the speaker provided from the selected camera is not satisfactory. 7. The method of claim 1 , wherein the one camera is the central camera of the plurality of cameras. 8. A non-transitory processor readable memory containing instructions that when executed cause a processor or processors to perform the following method of selecting a camera of a plurality of cameras, each with a different view of a group of participants in an environment and providing a video stream, one camera of the plurality of cameras having a microphone array, to provide a video stream for provision to a far end, the method comprising: determining the world coordinates of each participant for each camera of the plurality of cameras; utilizing sound source localization using the microphone array on the one camera to determine speaker direction information; identifying a speaker in the group of participants using the speaker direction information and an image from the video stream of the one camera; determining world coordinates of the speaker based on the identification; determining facial pose of the speaker in the image from the video stream of the one camera; selecting a camera from the plurality of cameras to provide a video stream for provision to the far end based on the locations of the plurality of cameras other than the one camera and the facial pose of the speaker; and utilizing the determined speaker world coordinates to frame the speaker in the video stream of the selected camera. 9. The non-transitory processor readable memory of claim 8 , the method further comprising: determining the rotation and translation of a coordinate system of each of the plurality of cameras to a world coordinate system. 10. The non-transitory processor readable memory of claim 9 , the method further comprising: selecting the camera providing the most frontal views of participants when there is not a speaker and there are participants; and selecting a default camera when there are no participants. 11. The non-transitory processor readable memory of claim 8 , wherein determining the world coordinates of each participant includes storing the determined world coordinates of each participant in a table of cameras and individuals from the perspective of the camera, and wherein utilizing the determined speaker world coordinates to frame the speaker includes using the determined speaker world coordinates to find the appropriate individual for the selected camera from the table. 12. The non-transitory processor readable memory of claim 8 , the method further comprising: determining if the frontal view of the speaker provided from the selected camera is satisfactory; and providing a framed view of the speaker from the selected camera when the frontal view of the speaker provided from the selected camera is satisfactory. 13. The non-transitory processor readable memory of claim 12 , the method further comprising: utilizing the determined speaker world coordinates to evaluate the frontal view of the speaker from each camera of the plurality of cameras other than the selected camera when the frontal view of the speaker provided from the selected camera is not satisfactory; and providing a framed view of the speaker from the camera of the plurality of cameras that has the best frontal view of the speaker when the frontal view of the speaker provided from the selected camera is not satisfactory. 14. The non-transitory processor readable memory of claim 8 , wherein the one camera is the central camera of the plurality of cameras. 15. A system for selecting a camera of a plurality of cameras, each with a different view of a group of participants in an environment, to provide a video stream for provision to a far end, the system comprising: a plurality of cameras, each camera including: an imager; a camera output interface for providing data and a video stream; camera random access memory (RAM); a camera processor coupled to the imager, the camera output interface and the camera RAM for executing instructions; and camera memory coupled to the camera processor for storing instructions executed by the processor, the camera memory storing instructions executed by the camera processor to perform the operation of providing a video stream from the camera, one camera of the plurality of cameras further including a microphone array and the camera memory of the one camera further storing instructions to utilize sound source localization using the microphone array to determine direction information and provide the direction information; and a codec coupled to the plurality of cameras, the codec including: a codec input interface for coupling to the plurality of cameras to receive data and video streams; a network interfa

Assignees

Plantronics

Inventors

Classifications

H04N7/15Primary
Conference systems · CPC title
G06V40/161
Detection; Localisation; Normalisation · CPC title
G10L17/06
Decision making techniques; Pattern matching strategies · CPC title
H04L12/1822
Conducting the conference, e.g. admission, detection, selection or grouping of participants, correlating users to one or more conference sessions, prioritising transmission · CPC title
H04N13/282
for generating image signals corresponding to three or more geometrical viewpoints, e.g. multi-view systems · CPC title

Patent family

Related publications grouped by family.

View patent family 83506651

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11800057B2 cover?: In a multi-camera videoconferencing configuration, the locations of each camera are known. By referencing a known object visible to each camera, a 3D coordinate system is developed, with the position and angle of each camera being associated with that 3D coordinate system. The locations of the conference participants in the 3D coordinate system are determined for each camera. Sound source local…
Who is the assignee on this patent?: Plantronics
What technology area does this patent fall under?: Primary CPC classification H04N7/15. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Oct 24 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

User Interface Tile Arrangement Based On Relative Locations Of Conference Participants

Intelligent multi-camera switching with machine learning

Optimal view selection in a teleconferencing system with cascaded cameras

System and method for automatically framing conversations in a meeting or a video conference

Auto-calibration of relative positions of multiple speaker tracking systems

Imaging apparatus, medium, and method for imaging

Frequently asked questions