Assignment of unique identifications to people in multi-camera field of view

US12412421B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12412421-B2
Application numberUS-202317971243-A
CountryUS
Kind codeB2
Filing dateFeb 5, 2023
Priority dateFeb 5, 2023
Publication dateSep 9, 2025
Grant dateSep 9, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A multi-camera video conference call system is provided with a plurality of cameras connected together over a communication network to generate a corresponding plurality of input frame images taken from different perspectives of a video conference room, where the multi-camera video conference call system detects one or more human heads for any meeting participants captured in the input frame images, generates a head bounding box which surrounds each detected human head, extracts a body bounding box which surrounds the detected human head and at least an upper body portion of a meeting participant belonging to the detected human head, generates a participant identification feature embedding from each body bounding box, and performs person re-identification processing on all generated participant identification feature embeddings to determine a count of the meeting participants in the video conference room.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for identifying meeting participants in a multi-camera video conference room, comprising: generating a plurality of input frame images taken from different perspectives of a video conference room by a corresponding plurality of cameras connected together; detecting, from an input frame image associated with each camera, one or more human heads for any meeting participants captured in the input frame image by applying a machine learning human head detector model to said input image frame; generating, from each detected human head, a head bounding box which surrounds the detected human head; extracting, from each head bounding box, a body bounding box which surrounds the detected human head and at least an upper body portion of a meeting participant belonging to the detected human head, thereby generating a plurality of body bounding boxes from the plurality of input frame images; generating, from each input frame image portion contained within the body bounding box, a participant identification feature embedding which uniquely identifies the meeting participant captured in the body bounding box, thereby generating a plurality of participant identification feature embeddings from the plurality of body bounding boxes; and performing person re-identification processing on the plurality of participant identification feature embeddings to determine a count of the meeting participants in the video conference room, wherein performing person re-identification processing comprises: dividing the plurality of participant identification feature embeddings into a query set and a gallery set, and comparing the query set to the gallery set to identify k top feature embedding matches so that matching feature embeddings are assigned to the same meeting participant, wherein the query set contains participant identification feature embeddings extracted from body bounding boxes generated from a first input frame captured at a primary camera, and wherein the gallery set contains participant identification feature embeddings extracted from body bounding boxes generated from one or more additional input frames captured at one or more secondary cameras. 2. The method of claim 1 , where detecting one or more human heads comprises classifying each detected human head as having a frontal, profile, or back head orientation and discarding any detected human head that is classified as a profile or back head orientation before extracting, from each head bounding box, a body bounding box. 3. The method of claim 1 , wherein detecting one or more human heads comprises: applying image pre-processing to each input frame image; applying a machine learning human head detector model to each input image frame to generate an output tensor for each detected human head; and applying image post-processing to convert each output tensor to a head bounding box which surrounds a corresponding detected human head. 4. The method of claim 1 , wherein extracting each body bounding box comprises extending the head bounding box by predetermined distances in both vertical and horizontal directions to surround the detected human head and at least the upper body portion of the meeting participant belonging to the detected human head. 5. The method of claim 1 , wherein generating each participant identification feature embedding comprises applying a deep convolutional neural network (CNN) model to generate a multi-dimensional feature embedding for each body bounding box. 6. The method of claim 1 , wherein the plurality of participant identification feature embeddings are generated at the plurality of cameras, and where a central codec performs person re-identification processing on the plurality of participant identification feature embeddings. 7. A system for identifying meeting participants in a multi-camera video conference room, comprising: a plurality of camera input devices connected over a communication network to a video codec device, where each of the camera input devices comprises: a first processor; a first data bus coupled to the first processor; and a non-transitory, computer-readable storage medium embodying computer program code, the non-transitory, computer-readable storage medium being coupled to the first data bus, the computer program code interacting with a plurality of computer operations and comprising first instructions executable by the first processor and configured for: generating an input frame image taken from a different perspective of a video conference room; detecting, from an input frame image associated with each camera, one or more human heads for any meeting participants captured in the input frame image by applying a machine learning human head detector model to said input image frame; generating, from each detected human head, a head bounding box which surrounds the detected human head; extracting, from each head bounding box, a body bounding box which surrounds the detected human head and at least an upper body portion of a meeting participant belonging to the detected human head; and generating, from each input frame image portion contained within the body bounding box, a participant identification feature embedding which uniquely identifies the meeting participant captured in the body bounding box; and where the video codec device comprises: a second processor; a second data bus coupled to the second processor; and a non-transitory, computer-readable storage medium embodying computer program code, the non-transitory, computer-readable storage medium being coupled to the second data bus, the computer program code interacting with a plurality of computer operations and comprising second instructions executable by the second processor and configured for: performing person re-identification processing on participant identification feature embeddings generated by the plurality of input camera devices to determine a count of the meeting participants in the video conference room, wherein the second instructions executable by the processor are configured for performing person re-identification processing by: dividing the plurality of participant identification feature embeddings into a query set and a gallery set, comparing the query set to the gallery set to identify k top feature embedding matches so that matching feature embeddings are assigned to the same meeting participant, wherein the query set contains participant identification feature embeddings extracted from body bounding boxes generated from a first input frame captured at a primary camera input device, and wherein the gallery set contains participant identification feature embeddings extracted from body bounding boxes generated from one or more additional input frames captured at one or more secondary camera input devices. 8. The system of claim 7 , wherein the first instructions executable by the processor are configured for detecting one or more human heads by classifying each detected human head as having a frontal, profile, or back head orientation and discarding any detected human head that is classified as a profile or back head orientation before extracting, from each head bounding box, a body bounding box. 9. The system of claim 7 , wherein the first instructions executable by the processor are configured for detecting one or more human heads by: applying image pre-processing to each input frame image; applying a machine learning human head detector model to each input image frame to generate an output tensor for each detected human head; and applying image post-processing to convert each output tensor to a head bounding box which surrounds a corresponding detected human head. 10. The system o

Assignees

Inventors

Classifications

  • G06T7/70Primary

    Determining position or orientation of objects or cameras (camera calibration G06T7/80) · CPC title

  • Artificial neural networks [ANN] · CPC title

  • Counting objects in image · CPC title

  • G06V40/172Primary

    Classification, e.g. identification · CPC title

  • Surveillance or monitoring of activities, e.g. for recognising suspicious objects (recognising microscopic objects G06V20/69) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12412421B2 cover?
A multi-camera video conference call system is provided with a plurality of cameras connected together over a communication network to generate a corresponding plurality of input frame images taken from different perspectives of a video conference room, where the multi-camera video conference call system detects one or more human heads for any meeting participants captured in the input frame im…
Who is the assignee on this patent?
Hewlett Packard Development Co
What technology area does this patent fall under?
Primary CPC classification G06T7/70. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 09 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).