System and method for online real-time multi-object tracking

US12387348B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12387348-B2
Application numberUS-202318489250-A
CountryUS
Kind codeB2
Filing dateOct 18, 2023
Priority dateFeb 27, 2018
Publication dateAug 12, 2025
Grant dateAug 12, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method for online real-time multi-object tracking is disclosed. A particular embodiment can be configured to: receive image frame data from at least one camera associated with an autonomous vehicle; generate similarity data corresponding to a similarity between object data in a previous image frame compared with object detection results from a current image frame; use the similarity data to generate data association results corresponding to a best matching between the object data in the previous image frame and the object detection results from the current image frame; cause state transitions in finite state machines for each object according to the data association results; and provide as an output object tracking output data corresponding to the states of the finite state machines for each object.

First claim

Opening claim text (preview).

The invention claimed is: 1. An apparatus, comprising: at least one processor; and at least one memory storing instructions that, upon execution by the at least one processor, cause the at least one processor to: detect an object in an image frame to obtain detection results from the image frame; determine similarity data of the detection results in relation to an object in another image frame, the similarity data including at least one of motion feature similarity or appearance feature similarity; maintain a template corresponds to the appearance feature extracted from each of the objects detected in the image frame; determine whether to update the template according to a similarity score for a detected object and the templated object; and identifying matching between the detection results and the object in another image frame. 2. The apparatus of claim 1 , wherein the motion feature similarity is calculated using a Kalman filter. 3. The apparatus of claim 1 , wherein the appearance feature similarity is obtained by a pre-trained convolutional neural network. 4. The apparatus of claim 1 , wherein the instructions further cause the at least one processor to: filter out the detection results having similarity score less than a predefined threshold. 5. The apparatus of claim 1 , wherein the instructions further cause the at least one processor to: cause a state transition in finite state machines for the object according to the matching. 6. A non-transitory computer-readable storage medium having code stored thereon, the code, upon execution by one or more processors, causing the one or more processors to implement a method comprising: calculating a similarity between object detection results in an image data and another object data; performing a data association using the similarity to find matching between the object detection results and the another object data; maintain a template corresponds to an appearance feature extracted from each of the objects detected in the image data; determine whether to update the template according to a similarity score for a detected object and the templated object; and causing state transition for an object in the image data based on a result of the data association. 7. The non-transitory computer-readable storage medium of claim 6 , wherein the method further comprises: detecting a new object in the image data, the new object having an initialized state; determining whether the new object is true or false; and transitioning the initialized state of the new object to a tracked state or a removed state based on the determining. 8. The non-transitory computer-readable storage medium of claim 6 , wherein the method further comprises: detecting an object in the image data that is currently in a tracked state; and determining whether to remain the object in the tracked state or transit to a lost state. 9. The apparatus of claim 1 , wherein the instructions further cause the at least one processor to: calculate a Euclidean distance between the appearance feature of an object defined as the template and the appearance feature of the detected objects; and generate the similarity data based on the calculated Euclidean distance. 10. The apparatus of claim 1 , wherein the instructions further cause the at least one processor to: determine a confidence level of a bounding box of the detected object; and update the template in response to the determination that the similarity score is less than a first threshold and the confidence level is greater than a second threshold. 11. The apparatus of claim 1 , wherein the instructions further cause the at least one processor to: maintain a plurality of different templates for each of the objects; and update the template by replacing a least recent use or an oldest template. 12. A method, comprising: receiving image data; calculating a similarity between object detection results in the image data and another object data; performing a data association using the similarity to find matching between the object detection results and the another object data; maintain a template corresponds to an appearance feature extracted from each of the objects detected in the image data; determine whether to update the template according to a similarity score for a detected object and the templated object; and causing state transition for an object in the image data based on a result of the data association. 13. The method of claim 12 , wherein the similarity is determined based on at least one of motion feature similarity calculated using a Kalman filter or appearance feature similarity obtained by a pre-trained convolutional neural network. 14. The method of claim 12 , further comprising: detecting a new object in the image data, the new object having an initialized state; determining whether the new object is true or false; and transitioning the initialized state of the new object to a tracked state or a removed state based on the determining. 15. The method of claim 14 , wherein transitioning the initialized state of the new object to the tracked state or the removed state based on the determining comprises: transitioning the initialized state of the new object to the removed state in response to the determination that the new object is false. 16. The method of claim 12 , further comprising: detecting an object in the image data that is currently in a tracked state; determining whether to remain the object in the tracked state or transit to a lost state. 17. The method of claim 12 , further comprising generating output data corresponding to states of finite state machines for an object in the image data. 18. The method of claim 17 , further comprising smoothing the output data. 19. The method of claim 17 , wherein the output data is adjusted based on a weighted average of the object detection results and a prediction of a Kalman filter. 20. The non-transitory computer-readable storage medium of claim 6 , wherein the updating of the template includes: updating an appearance feature of the object by obtaining the appearance feature extracted from a matching object bounding box.

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking · CPC title

  • using neural networks · CPC title

  • using classification, e.g. of video objects · CPC title

  • Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12387348B2 cover?
A system and method for online real-time multi-object tracking is disclosed. A particular embodiment can be configured to: receive image frame data from at least one camera associated with an autonomous vehicle; generate similarity data corresponding to a similarity between object data in a previous image frame compared with object detection results from a current image frame; use the similarit…
Who is the assignee on this patent?
Tusimple Inc
What technology area does this patent fall under?
Primary CPC classification G06T7/248. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 12 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).