Object velocity detection from multi-modal sensor data

US11628855B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11628855-B1
Application numberUS-202016866839-A
CountryUS
Kind codeB1
Filing dateMay 5, 2020
Priority dateMay 5, 2020
Publication dateApr 18, 2023
Grant dateApr 18, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Ground truth data may be too sparse to supervise training of a machine-learned (ML) model enough to achieve an ML model with sufficient accuracy/recall. For example, in some cases, ground truth data may only be available for every third, tenth, or hundredth frame of raw data. Training an ML model to detect a velocity of an object when ground truth data for training is sparse may comprise training the ML model to predict a future position of the object based at least in part on image, radar, and/or lidar data (e.g., for which no ground truth may be available). The ML model may be altered based at least in part on a difference between ground truth data associated with a future time and the future position.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving sensor data associated with a first time, the sensor data comprising point cloud data and image data representing a portion of an environment surrounding an autonomous vehicle; receiving an object detection associated with the first time, wherein the object detection identifies an object in the image data; determining, based at least in part on the object detection, a first subset of the sensor data comprising a portion of the image data and a portion of the point cloud data; inputting the first subset of the sensor data into a machine-learned (ML) model; receiving, from the ML model, an output; determining, based at least in part on the output, a velocity associated with the object; determining, based at least in part on at least one of the output or the velocity, a predicted location of the object at a second time after the first time; receiving ground truth data indicative of a three-dimensional ROI associated with the object and the second time; determining a difference between the predicted location and a center of the three-dimensional ROI; altering one or more parameters of the ML model based at least in part on the difference; and transmitting the ML model to a vehicle to control motion of the vehicle. 2. The method of claim 1 , wherein inputting the first subset of sensor data further comprises inputting one or more of: a classification associated with the object, a track associated with the object, a motion state associated with the object, or a doppler velocity associated with the object. 3. The method of claim 1 , wherein the output comprises the velocity and a second three-dimensional ROI and determining the predicted location comprises: projecting a center of the second three-dimensional ROI forward based at least in part on the velocity. 4. The method of claim 1 , wherein the output comprises the predicted location and a second three-dimensional ROI and determining the velocity is based at least in part on a distance between the predicted location and a center of the second three-dimensional ROI. 5. The method of claim 1 , wherein altering the one or more parameters comprises determining an L1 loss based on the difference and a learned covariance value. 6. A system comprising: one or more processors; and a memory storing processor-executable instructions that, when executed by the one or more processors, cause the system to perform operations comprising: receiving sensor data associated with a first time, the sensor data comprising point cloud data and image data representing a portion of an environment surrounding an autonomous vehicle and the sensor data being associated with an object in the environment; inputting the sensor data into a machine-learned (ML) model; receiving, from the ML model, an output; determining, based at least in part on the output, a velocity associated with the object; determining, based at least in part on at least one of the output or the velocity, a predicted location of the object at a second time after the first time; receiving ground truth data indicative of a three-dimensional ROI associated with the object and the second time; determining a difference between the predicted location and a center of the three-dimensional ROI; altering one or more parameters of the ML model based at least in part on the difference; and transmitting the ML model to a vehicle to control motion of the vehicle. 7. The system of claim 6 , wherein inputting the first subset of sensor data further comprises inputting one or more of: a classification associated with the object, a track associated with the object, a motion state associated with the object, or a doppler velocity associated with the object. 8. The system of claim 6 , wherein the output comprises the velocity and a second three-dimensional ROI and determining the predicted location comprises: projecting a center of the second three-dimensional ROI forward based at least in part on the velocity. 9. The system of claim 6 , wherein the output comprises the predicted location and a second three-dimensional ROI and determining the velocity is based at least in part on a distance between the predicted location and a center of the second three-dimensional ROI. 10. The system of claim 6 , wherein altering the one or more parameters comprises determining an L1 loss based on the difference and a learned covariance value. 11. The system of claim 6 , wherein: the operations further comprise receiving an object detection identifying the sensor data as being associated with the object; the object detection further comprises an indication that the object is a pedestrian and a movement state of the pedestrian; and determining the velocity is further based at least in part on the movement state. 12. The system of claim 11 , wherein the movement state comprises standing, sitting, lying, walking, or running. 13. The system of claim 11 , wherein the object detection further comprises: a location of a particular portion of the pedestrian; and determining the velocity is further based at least in part on the location. 14. A non-transitory computer-readable medium storing processor-executable instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: receiving sensor data associated with a first time, the sensor data comprising point cloud data and image data representing a portion of an environment surrounding an autonomous vehicle and the sensor data being associated with an object in the environment; inputting the sensor data into a machine-learned (ML) model; receiving, from the ML model, a first three-dimensional region of interest (ROI) and a velocity associated with the object; determining, based at least in part on the velocity and the first three-dimensional ROI, a predicted location of the object at a second time after the first time; receiving ground truth data indicative of a second three-dimensional ROI associated with the object and the second time; determining a difference between the predicted location and a center of the second three-dimensional ROI; and altering one or more parameters of the ML model based at least in part on the difference. 15. The non-transitory computer-readable medium of claim 14 , wherein inputting the sensor data further comprises inputting one or more of: a classification associated with the object, a track associated with the object, a motion state associated with the object, or a doppler velocity associated with the object. 16. The non-transitory computer-readable medium of claim 14 , wherein the output comprises the velocity and a second three-dimensional ROI and determining the predicted location comprises: projecting a center of the second three-dimensional ROI forward based at least in part on the velocity. 17. The non-transitory computer-readable medium of claim 14 , wherein the output comprises the predicted location and a second three-dimensional ROI and determining the velocity is based at least in part on a distance between the predicted location and a center of the second three-dimensional ROI. 18. The non-transitory computer-readable medium of claim 14 , wherein: the operations further comprise receiving an object detection identifying the sensor data as being associated with the object; the object detection further comprises an indication that the object is a pedestrian and a movement state of the pedestrian; and determining the velocity is f

Assignees

Inventors

Classifications

  • combined with communication equipment with other vehicles or with base stations · CPC title

  • G01S13/867Primary

    Combination of radar systems with cameras · CPC title

  • Combinations of radar systems with non-radar systems, e.g. sonar, direction finder · CPC title

  • measuring the velocity vector · CPC title

  • Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11628855B1 cover?
Ground truth data may be too sparse to supervise training of a machine-learned (ML) model enough to achieve an ML model with sufficient accuracy/recall. For example, in some cases, ground truth data may only be available for every third, tenth, or hundredth frame of raw data. Training an ML model to detect a velocity of an object when ground truth data for training is sparse may comprise traini…
Who is the assignee on this patent?
Zoox Inc
What technology area does this patent fall under?
Primary CPC classification G01S13/867. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 18 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).