Subject identification and tracking using image recognition
US-10055853-B1 · Aug 21, 2018 · US
US11030756B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11030756-B2 |
| Application number | US-202017104323-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 25, 2020 |
| Priority date | Oct 26, 2018 |
| Publication date | Jun 8, 2021 |
| Grant date | Jun 8, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A tracking system includes a camera subsystem that includes cameras that capture vide of a space. Each camera is coupled with a camera client that determines local coordinates of people in the captured video. The camera clients generate frames that include color frames and depth frames labeled with an identifier number of the camera and their corresponding timestamps. The camera clients generate tracks that include metadata describing historical people detections, tracking identifications, timestamps, and the identifier number of the camera. The camera clients send the frames and tracks to cluster servers that maintain the frames and tracks such that they are retrievable using their corresponding labels. A camera server queries the cluster servers to receive the frames and tracks using their corresponding labels. The camera server determines the physical positions of people in the space based on the determined local coordinates.
Opening claim text (preview).
What is claimed is: 1. A system comprising: an array of cameras positioned above a space, wherein: each camera of the array of cameras is operatively coupled with a camera client from an array of camera clients; each camera of the array of cameras is configured to capture a video of a portion of the space, the space containing a person; the array of camera clients operably coupled with the array of cameras; wherein: a first camera client of the array of camera clients is operably coupled with a first camera and configured to: receive a first plurality of frames of a first video from the first camera, wherein each frame of the first plurality of frames shows the person within the space, the first plurality of frames comprises a first plurality of color frames and a first plurality of depth frames, wherein: the first plurality of color frames corresponds to visual colors of objects in the space; and the first plurality of depth frames corresponds to distances of objects in the space from the first camera; generate a timestamp when each corresponding color and depth frame is received by the first camera client; send the first plurality of frames labeled with one or more corresponding timestamps and an identifier number of the first camera client to a first server from among a plurality of cluster servers; generate a first plurality of tracks by performing a local position tracking of the person in the first plurality of depth frames; for a first depth frame of the first plurality of depth frames, generate a first track of the first plurality of tracks by: detecting a first contour associated with the person; determining, based on pixel coordinates of the first contour, a first bounding area around the person shown in the first depth frame; determining, based on the first bounding area, first coordinates of the person in the first depth frame; and associating a first tracking identification to the person, wherein the first tracking identification is linked to historical detections associated with the person, wherein the historical detections associated with the person comprise at least one of a contour, a bounding area, and a segmentation mask associated with the person; for a second depth frame of the first plurality of depth frames, generate a second track of the first plurality of tracks by: detecting a second contour associated with the person; determining, based on pixel coordinates of the second contour, a second bounding area around the person shown in the second depth frame; determining, based on the second bounding area, second coordinates of the person in the second depth frame; determining whether the second bounding area corresponds to the first bounding area; and in response to determining that the second bounding area corresponds to the first bounding area, associating the first tracking identification to the person; send the first plurality of tracks labeled with one or more corresponding timestamps, the identifier number of the first camera, the historical detections associated with the person, and the first tracking identification associated with the person to a second server from among the plurality of cluster servers; a second camera client of the array of camera clients is operably coupled with a second camera and separate from the first camera client, the second camera client configured to: receive a second plurality of frames of a second video from the second camera, wherein each frame of the second plurality of frames shows the person within the space, the second plurality of frames comprises a second plurality of color frames and a second plurality of depth frames, wherein: the second plurality of color frames corresponds to visual colors of objects in the space; and the second plurality of depth frames corresponds to distances of objects in the space from the second camera; generate a timestamp when each corresponding color and depth frame is received by the second camera client; send the second plurality of frames labeled with one or more corresponding timestamps and an identifier number of the second camera to the first server from among the plurality of cluster servers; generate a second plurality of tracks by performing a local position tracking of the person in the second plurality of depth frames; for a third depth frame of the second plurality of depth frames, generate a third track of the second plurality of tracks by: detecting a third contour associated with the person; determining, based on pixel coordinates of the third contour, a third bounding area around the person shown in the third depth frame; determining, based on the third bounding area, third coordinates of the person in the third depth frame; and associating a second tracking identification to the person, wherein the second tracking identification is linked to the historical detections associated with the person; for a fourth depth frame of the second plurality of depth frames, generate a fourth track of the second plurality of tracks by: detecting a fourth contour associated with the person; determining, based on pixel coordinates of the fourth contour, a fourth bounding area around the person shown in the fourth depth frame; determining, based on the fourth bounding area, fourth coordinates of the person in the fourth depth frame; determining whether the fourth bounding area corresponds to the third bounding area; and in response to determining that the fourth bounding area corresponds to the third bounding area, associating the second tracking identification to the person; send the second plurality of tracks labeled with one or more corresponding timestamps, the identification number of the second camera, the historical detections associated with the person, and the second tracking identification associated with the person to the second server from among the plurality of cluster servers; and each server from among the plurality of cluster servers configured to: receive the first plurality of frames and the first plurality of tracks from the first camera client; receive the second plurality of frames and the second plurality of tracks from the second camera client; store the first and second plurality of frames such that a particular frame from the first and second plurality of frames is retrievable using one or more corresponding labels comprising an identifier number of a camera associated with the particular frame and a timestamp associated with the particular frame; and store the first and second plurality of tracks such that a particular track from the first and second plurality of tracks is retrievable using one or more corresponding labels comprising an identifier number of a camera associated with the particular track, a timestamp associated with the particular track, a particular historical detection associated with a person detected in the particular track, and a particular tracking identification detected in the particular track. 2. The system of claim 1 , wherein: determining whether the second bounding area corresponds to the first bounding area is based on one or more metrics comprising: an overlapping region between the first bounding area and the second bounding area, a ratio of intersection over union region between the first bounding area and the second bounding area, and a distance between the center of the first bounding area and the center of the second bounding area; if it is determined that: the overlapping region between the first bounding area and the second bounding area is above a threshold region; the ratio of intersection over union region between the first bounding area and the second bounding area is above a threshold value; and the distance between the center of the first bounding area and the center of the second bounding area is below a threshold distance, determine t
Arrangement of cameras or camera modules, e.g. multiple cameras in TV studios or sports stadiums · CPC title
Combinations of radar systems, e.g. primary radar and secondary radar · CPC title
Combination of radar systems with cameras · CPC title
Combinations of radar systems with non-radar systems, e.g. sonar, direction finder · CPC title
Multiple target tracking · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.