Top-down view object detection and tracking

US12012127B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12012127-B2
Application numberUS-202016779576-A
CountryUS
Kind codeB2
Filing dateJan 31, 2020
Priority dateOct 26, 2019
Publication dateJun 18, 2024
Grant dateJun 18, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Tracking a current and/or previous position, velocity, acceleration, and/or heading of an object using sensor data may comprise determining whether to associate a current object detection generated from recently received (e.g., current) sensor data with a previous object detection generated from formerly received sensor data. In other words, a track may identify that an object detected in former sensor data is the same object detected in current sensor data. However, multiple types of sensor data may be used to detect objects and some objects may not be detected by different sensor types or may be detected differently, which may confound attempts to track an object. An ML model may be trained to receive outputs associated with different sensor types and/or a track associated with an object, and determine a data structure comprising a region of interest, object classification, and/or a pose associated with the object.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving a first object detection associated with a first sensor type and a second object detection associated with a second sensor type, the first object detection and the second object detection identifying an object in an environment surrounding an autonomous vehicle; determining, based at least in part on previous sensor data, a previous track associated with the object, the previous track identifying at least one of an estimated previous position of the object, a previous region of interest, or a previous velocity of the object; inputting the first object detection, the second object detection, and at least part of the previous track into a machine learning (ML) model; receiving, from the ML model, a data structure comprising a region of interest, object classification, and a pose associated with the object, the pose indicating at least one of a position or a yaw associated with the object; determining, based at least in part on the data structure, a new track associated with the object wherein the new track indicates that the object detected in the previous sensor data is a same object detected in current sensor data; updating, based at least in part on the data structure, one or more previous tracks by retiring the one or more previous tracks, wherein retiring the one or more previous tracks comprises indicating that the object associated with the one or more previous tracks has been occluded for a threshold amount of time; and controlling the autonomous vehicle based at least in part on the new track. 2. The method of claim 1 , wherein the data structure additionally comprises at least one of an indication that the object is stationary or dynamic, a top-down segmentation of the environment, a yaw rate, a velocity associated with the object, or an acceleration associated with the object. 3. The method of claim 1 , wherein determining the new track comprises: determining a degree of alignment of the region of interest to the previous region of interest; and determining that the degree of alignment meets or exceeds a threshold degree of alignment. 4. The method of claim 1 , further comprising: receiving a first prior object detection associated with a first time previous to a second time at which the first object detection was generated; receiving a second prior object detection associated with a third time previous to a fourth time at which the second object detection was generated; and inputting the first prior object detection and the second prior object detection to the ML model in addition to the first object detection, the second object detection, and the previous track. 5. The method of claim 1 , wherein inputting the first object detection, the second object detection, and at least part of the previous track comprises: generating a multi-channel data structure based at least in part on the first object detection, the second object detection, and at least part of the previous track, wherein generating the multi-channel data structure comprises encoding attributes associated with the environment into channels of the multi-channel data structure based at least in part on the first object detection, the second object detection, and at least part of the previous track; and inputting the multi-channel data structure to the ML model. 6. The method of claim 1 , wherein: the first object detection is based at least in part on sensor data that has a first perspective of the environment; the data structure indicates a top-down perspective of the environment; and the top-down perspective is different than the first perspective. 7. The method of claim 1 , further comprising reducing, based at least in part on comparing object detections from each of multiple different sensor modalities to the previous track, jitter associated with the new track. 8. The method of claim 1 , wherein updating the one or more previous tracks further comprises at least one of associating one or more of the first object detection or the second object detection with the one or more previous tracks or indicating that the one or more previous tracks associated with an object is partially or fully occluded. 9. A system comprising: one or more processors; and a memory storing processor-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: receiving first sensor data and second sensor data; inputting the first sensor data to a first perception pipeline and inputting the second sensor data to a second perception pipeline; receiving a first output from the first perception pipeline based at least in part on the first sensor data and a second output from the second perception pipeline, the first output and the second output identifying an object in an environment; receiving a previous track associated with the object in the environment, the previous track identifying at least one of an estimated previous position of the object, a previous region of interest, or a previous velocity of the object; inputting the first output, the second output, and at least part of the previous track into a machine-learning (ML) model; receiving, from the ML model, a data structure comprising a region of interest, object classification, and a pose associated with the object, the pose indicating at least one of a position or a yaw associated with the object; determining an updated track associated with the object based at least in part on the data structure, a current position, and at least one of the region of interest or the yaw associated with the object; and updating, based at least in part on the data structure, one or more previous tracks by retiring the one or more previous tracks, wherein retiring the one or more previous tracks comprises indicating that the object associated with the one or more previous tracks has been occluded for a threshold amount of time. 10. The system of claim 9 , wherein the data structure additionally comprises at least one of an indication that the object is stationary or dynamic, a top-down segmentation of the environment, a yaw rate, a velocity associated with the object, or an acceleration associated with the object. 11. The system of claim 9 , wherein: a third output indicates that a second portion of the environment associated with the first output and the second output is unoccupied; and the third output is provided as input to the ML model in addition to the first output and the second output. 12. The system of claim 9 , wherein determining the updated track comprises: determining a degree of alignment of the region of interest to the previous region of interest; and determining that the degree of alignment meets or exceeds a threshold degree of alignment. 13. The system of claim 9 , wherein at least one of the first output or the second output comprises at least one of: a first representation of the environment from a top-down perspective; an indication that a second portion of the environment is occupied; a second representation of an occluded portion of the environment; a second region of interest associated with the object; a classification associated with the object; a sensor data segmentation; a three-dimensional discretized representation of sensor data; a yaw associated with the object; a yaw rate associated with the object; a ground height estimation; a set of extents associated with the object; a velocity associated with the object; or an acceleration associated with the object. 14. The system of claim 9 , wherein the operations further compr

Assignees

Inventors

Classifications

  • using signals provided by artificial sources external to the vehicle, e.g. navigation beacons · CPC title

  • Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level (multimodal speaker identification or verification G10L17/10) · CPC title

  • Classification techniques · CPC title

  • Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Combination of methods, e.g. classifiers, working on different input data, e.g. sensor fusion · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12012127B2 cover?
Tracking a current and/or previous position, velocity, acceleration, and/or heading of an object using sensor data may comprise determining whether to associate a current object detection generated from recently received (e.g., current) sensor data with a previous object detection generated from formerly received sensor data. In other words, a track may identify that an object detected in forme…
Who is the assignee on this patent?
Zoox Inc
What technology area does this patent fall under?
Primary CPC classification B60W60/0027. Mapped technology areas include Operations & Transport.
When was this patent published?
Publication date Tue Jun 18 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).