System and method for online real-time multi- object tracking

US11830205B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11830205-B2
Application numberUS-202217656415-A
CountryUS
Kind codeB2
Filing dateMar 24, 2022
Priority dateFeb 27, 2018
Publication dateNov 28, 2023
Grant dateNov 28, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method for online real-time multi-object tracking is disclosed. A particular embodiment can be configured to: receive image frame data from at least one camera associated with an autonomous vehicle; generate similarity data corresponding to a similarity between object data in a previous image frame compared with object detection results from a current image frame; use the similarity data to generate data association results corresponding to a best matching between the object data in the previous image frame and the object detection results from the current image frame; cause state transitions in finite state machines for each object according to the data association results; and provide as an output object tracking output data corresponding to the states of the finite state machines for each object.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: at least one processor; and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to: receive image data; detect objects in the image data; generate similarity data corresponding to a similarity between objects detected in a first frame of the image data compared with objects detected in a second frame of the image data; maintain a template for each of the objects detected in the image data; and match the objects detected in the first frame with the objects detected in the second frame based on the similarity data. 2. The apparatus of claim 1 , wherein the image data is received from at least one camera. 3. The apparatus of claim 2 , wherein the at least one camera is coupled to an autonomous vehicle. 4. The apparatus of claim 1 wherein the at least one memory and computer program code are configured to, with the at least one processor, further cause the apparatus to: extract an appearance feature from each of the objects detected in the image data. 5. The apparatus of claim 4 wherein the appearance feature is extracted by a convolutional neural network. 6. The apparatus of claim 4 wherein to generate the similarity data, the at least one memory and computer program code are configured to, with the at least one processor, further cause the apparatus to: calculate a Euclidean distance between the appearance feature of an object defined as the template and the appearance feature of the detected objects; and generate the similarity data based on the calculated Euclidean distance. 7. The apparatus of claim 4 wherein the template corresponds to the appearance feature extracted from each of the objects. 8. The apparatus of claim 4 wherein the similarity data is generated based on a motion feature similarity and an appearance feature similarity. 9. The apparatus of claim 8 wherein the at least one memory and computer program code are configured to, with the at least one processor, further cause the apparatus to: determine a similarity score for a detected object and a templated object based on the appearance feature; determine whether the similarity score is less than a first threshold; and determine whether a confidence level of a bounding box of the detected object is greater than a second threshold, wherein the template is updated in response to the determination that the similarity score is less than the first threshold and the confidence level is greater than the second threshold. 10. The apparatus of claim 8 wherein the motion feature similarity is based on a prediction of a Kalman filter and a bounding box position of a detected object. 11. The apparatus of claim 1 wherein to maintain the template of each of the objects detected in the image data, the at least one memory and computer program code are configured to, with the at least one processor, further cause the apparatus to: update the template based on the similarity data. 12. The apparatus of claim 1 wherein the at least one memory and computer program code are configured to, with the at least one processor, further cause the apparatus to: create a finite state machine for each of the objects detected in the image data, wherein the finite state machine for each of the objects represents a current detection state, wherein the current detection state comprises one of a plurality of possible object detection states for a corresponding object. 13. A method comprising: receiving image data; detecting objects in the image data; generating similarity data corresponding to a similarity between objects detected in a first frame of the image data compared with objects detected in a second frame of the image data; maintaining a template for each of the objects detected in the image data; and matching the objects detected in the first frame with the objects detected in the second frame based on the similarity data. 14. The method of claim 13 , further comprising generating, based on the similarity data, match results corresponding to a best match between the objects in the first frame and the objects in the second frame. 15. The method of claim 14 , further comprising causing state transitions in finite state machines for each of the objects according to the match results. 16. The method of claim 15 , wherein the finite state machines for each of the objects comprise states from the group consisting of: initialized, tracked, lost, and removed. 17. The method of claim 15 , further comprising generating output data corresponding to states of the finite state machines for each of the objects. 18. The method of claim 17 , further comprising smoothing the output data. 19. The method of claim 17 , wherein the output data is adjusted based on a weighted average calculation for data of the objects detected, or a prediction of a Kalman filter. 20. A non-transitory computer-readable medium including computer program instructions which, when executed by at least one processor, cause the at least one processor to: receive image data; detect objects in the image data; generate similarity data corresponding to a similarity between objects detected in a first frame of the image data compared with objects detected in a second frame of the image data; maintain a template for each of the objects detected in the image data; and match the objects detected in the first frame with the objects detected in the second frame based on the similarity data.

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • G06T7/277Primary

    involving stochastic approaches, e.g. using Kalman filters · CPC title

  • Architecture, e.g. interconnection topology · CPC title

  • Learning methods · CPC title

  • G06T7/248Primary

    involving reference images or patches · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11830205B2 cover?
A system and method for online real-time multi-object tracking is disclosed. A particular embodiment can be configured to: receive image frame data from at least one camera associated with an autonomous vehicle; generate similarity data corresponding to a similarity between object data in a previous image frame compared with object detection results from a current image frame; use the similarit…
Who is the assignee on this patent?
Tusimple Inc
What technology area does this patent fall under?
Primary CPC classification G06T7/277. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 28 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).