Real-time object detection, tracking and occlusion reasoning

US9904852B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9904852-B2
Application numberUS-201414286305-A
CountryUS
Kind codeB2
Filing dateMay 23, 2014
Priority dateMay 23, 2013
Publication dateFeb 27, 2018
Grant dateFeb 27, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system for object detection and tracking includes technologies to, among other things, detect and track moving objects, such as pedestrians and/or vehicles, in a real-world environment, handle static and dynamic occlusions, and continue tracking moving objects across the fields of view of multiple different cameras.

First claim

Opening claim text (preview).

The invention claimed is: 1. A real-time object detection and tracking system comprising, embodied in one or more computer accessible storage media: an object tracking (OT) node to detect and track one or more objects in a group of objects depicted in a video stream produced by a camera having a field of view arranged to acquire images of a real-world environment, by: detecting a plurality of parts of objects in the group of objects, each of the detected parts being less than an entire object; identifying one or more occlusions depicted in the video stream; inferring a spatial relationship between at least two of the objects in the group of objects based on the detected parts and the identified occlusions; and outputting a local track for one or more of the objects in the group of objects, the local track comprising a unique identifier and a geo-location of one or more of the objects at different time instants of the video stream. 2. The system of claim 1 , wherein the OT node infers a spatial relationship between a temporarily and/or partially occluded object of the group of objects and at least one other object in the group of objects based on the detected parts and the identified occlusions. 3. The system of claim 1 , comprising a plurality of OT nodes each associated with a field of view of a different camera of a plurality of cameras arranged in the real-world environment, wherein each of the OT nodes is to output a local track for a different field of view, and an OT manager to create a global track by fusing the local tracks output by the OT nodes. 4. The system of claim 3 , wherein at least two of the cameras have non-overlapping fields of view and the OT manager is to determine if a local track associated with one of the non-overlapping fields of view is a continuation of a local track associated with another of the non-overlapping fields of view. 5. The system of claim 3 , wherein at least one of the cameras is embodied in a mobile consumer electronic device, and the mobile consumer electronic device is to transmit the video stream to the OT node by a communication network. 6. The system of claim 1 , wherein the OT node comprises a scene awareness module to determine a dynamic occlusion map for the field of view associated with the OT node, and the OT node uses the dynamic occlusion map to create the local track. 7. The system of claim 6 , wherein the scene awareness module is to create the dynamic occlusion map by identifying one or more dynamic occluders in the video stream, the one or more dynamic occluders comprising real-world objects that have a variable geo-location in the video stream. 8. The system of claim 1 , wherein the OT node comprises a scene awareness module to determine a static occlusion map for the field of view associated with the OT node, and the OT node uses the static occlusion map to create the local track. 9. The system of claim 8 , wherein the scene awareness module creates the static occlusion map by identifying one or more static occluders in the video stream, the static occluders comprising real-world objects that are determined to have a substantially constant geo-location in the video stream. 10. The system of claim 1 , wherein the OT node comprises a scene awareness module to create static and dynamic masks for static and dynamic occluders, respectively, wherein the static and dynamic occluders are detected in the video stream, and map the static and dynamic masks to the ground plane. 11. The system of claim 1 , wherein the OT node comprises a human detection module to detect a plurality of persons in the field of view including a temporarily and/or partially occluded person, determine a geo-location of the temporarily and/or partially occluded person, and output data indicating the geo-location of the temporarily and/or partially occluded person. 12. The system of claim 11 , wherein the human detection module comprises a plurality of part-based detectors, each of the part-based detectors is to detect a different part of a human body that is less than a full human body. 13. The system of claim 12 , wherein the human detection module comprises an occlusion reasoning engine to estimate a position of a body part that is at least partially occluded in an image of the video stream based on relative positions of other body parts that are detected by the part-based detectors. 14. The system of claim 11 , wherein the human detection module comprises a plurality of part-based detectors, each of the part-based detectors is to generate a detection hypothesis for a different part of a human body, and the human detection module comprises an occlusion reasoning engine to reason over the detection hypotheses for the different body parts and generate a detection hypothesis for the full human body of the temporarily and/or partially occluded person. 15. The system of claim 14 , wherein the human detection module comprises a hypothesis fusion engine to algorithmically predict a spatial arrangement of the temporarily and/or partially occluded person in relation to at least one of the other persons detected in the field of view. 16. The system of claim 11 , wherein the human detection module comprises a stereo-based human detector to receive a second video stream from a second camera, compute a depth map using images from the video stream and the second video stream, and use the depth map to predict the spatial relationship of the temporarily and/or partially occluded person relative to at least one of the other detected persons. 17. The system of claim 11 , wherein the OT node comprises a tracking module to initiate and maintain a local track for the temporarily and/or partially occluded person detected in the field of view. 18. The system of claim 17 , wherein the tracking module is to feed the local track back to the human detection module for use by the human detection module in analyzing another time instant of the video stream. 19. The system of claim 17 , wherein the tracking module comprises an occlusion reasoning module to determine whether to suspend or terminate the tracking of the temporarily and/or partially occluded person. 20. The system of claim 19 , wherein the occlusion reasoning module defines a personal occlusion zone for at least one of the detected persons and uses the personal occlusion zone to track the temporarily and/or partially occluded person. 21. The system of claim 17 , wherein the tracking module comprises a reacquisition module to link a suspended or terminated local track for the temporarily and/or partially occluded person with a subsequently-generated local track for the same person. 22. The system of claim 17 , wherein the tracking module comprises an appearance model to represent the appearance of different body parts of the temporarily and/or partially occluded person by different spatially-encoded color histograms. 23. An object tracking (OT) node to detect and track at least one person depicted in a video stream produced by a camera having a field of view arranged to acquire images of a real-world environment, the OT node comprising, embodied in one or more computer accessible storage media: a scene awareness module to generate an occlusion map identifying one or more occlusions in a scene in the video stream; a human detection module to (i) use the occlusion map to detect a temporarily and/or partially occluded person in a group of persons detected in the video stream and (ii) determine a spatial relation

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • G06V20/52Primary

    Surveillance or monitoring of activities, e.g. for recognising suspicious objects (recognising microscopic objects G06V20/69) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9904852B2 cover?
A system for object detection and tracking includes technologies to, among other things, detect and track moving objects, such as pedestrians and/or vehicles, in a real-world environment, handle static and dynamic occlusions, and continue tracking moving objects across the fields of view of multiple different cameras.
Who is the assignee on this patent?
Stanford Res Inst Int
What technology area does this patent fall under?
Primary CPC classification G06K9/00771. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 27 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).