Perception and motion prediction for autonomous devices
US-11548533-B2 · Jan 10, 2023 · US
US11688181B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11688181-B2 |
| Application number | US-202117353231-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 21, 2021 |
| Priority date | Jun 25, 2020 |
| Publication date | Jun 27, 2023 |
| Grant date | Jun 27, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In various examples, a multi-sensor fusion machine learning model—such as a deep neural network (DNN)—may be deployed to fuse data from a plurality of individual machine learning models. As such, the multi-sensor fusion network may use outputs from a plurality of machine learning models as input to generate a fused output that represents data from fields of view or sensory fields of each of the sensors supplying the machine learning models, while accounting for learned associations between boundary or overlap regions of the various fields of view of the source sensors. In this way, the fused output may be less likely to include duplicate, inaccurate, or noisy data with respect to objects or features in the environment, as the fusion network may be trained to account for multiple instances of a same object appearing in different input representations.
Opening claim text (preview).
What is claimed is: 1. A processor comprising: one or more circuits to: receive first data representative of a plurality of outputs of a plurality of deep neural networks (DNNs), at least one output of the plurality of outputs corresponding to a respective sensor having a respective field of view different from fields of view corresponding to one or more others sensors of a plurality of sensors of an autonomous machine; compute, using a fusion DNN and based at least in part on the first data, second data representative of a fusion of the plurality of outputs; and perform one or more operations using the autonomous machine based at least in part on the second data. 2. The processor of claim 1 , wherein the computing the second data is further based at least in part on third data representative of at least one probability distribution function corresponding to at least one point of at least one of the plurality of outputs, the at least one point corresponding to a detected object and the at least one probability distribution function corresponding to one or more potential locations of the detected object. 3. The processor of claim 1 , wherein the computing the second data is further based at least in part on third data representative of one or more velocity representations including encoded values corresponding to at least one of a velocity in an x-direction or a velocity in a y-direction. 4. The processor of claim 1 , wherein the computing the second data is further based at least in part on third data representative of one or more representations corresponding to at least one of object instances or object appearances determined using the plurality of outputs. 5. The processor of claim 1 , wherein each output of the plurality of outputs includes a rasterized image representing one or more objects, and the the plurality of outputs includes a fused rasterized image. 6. The processor of claim 5 , wherein the one or more objects include at least one of a vehicle, a pedestrian, a bicyclist, a motorist, a lane marker, a road boundary marker, a freespace boundary, or a wait line. 7. The processor of claim 1 , wherein: a first output of the plurality of outputs corresponds to a first field of view; a second output of the plurality of outputs corresponds to a second field of view different from the first field of view; and the fusion of the plurality of outputs corresponds to both the first field of view and the second field of view. 8. The processor of claim 7 , wherein the first field of view and the second field of view are at least partially overlapping. 9. The processor of claim 1 , wherein the first data is further representative of one or more additional outputs generated using a LiDAR sensor, a RADAR sensor, or an ultrasonic sensor, and the one or more additional outputs are generated using another DNN or without using another DNN. 10. The processor of claim 1 , wherein: a first output of the plurality of outputs includes a first representation of an object; a second output of the plurality of outputs includes a second representation of the object; and the fusion of the plurality of outputs includes a fused representation of the object. 11. The processor of claim 1 , wherein the processor is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing deep learning operations; a system implemented using an edge device; a system implemented using a robot; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources. 12. A system comprising: one or more processing units; and one or more memory units storing instructions that, when executed by the one or more processing units, cause the one or more processing units to execute operations comprising: receiving first data representative of at least a first rasterized image generated using a first deep neural network (DNN) and based at least in part on first sensor data generated using a first sensor, the first rasterized image including at least a first object; receiving second data representative of at least a second rasterized image generated using a second deep neural network (DNN) and based at least in part on second sensor data generated using a second sensor, the second rasterized image including at a least a second object; computing, using a fusion DNN and based at least in part on the first data and the second data, third data representative of a fused rasterized image including both the first object and the second object; and performing one or more operations using an autonomous machine based at least in part on the third data. 13. The system of claim 12 , wherein the first sensor and the second sensor include one of an image sensor, a LiDAR sensor, a RADAR sensor, or an ultrasonic sensor. 14. The system of claim 12 , wherein the first sensor and the second sensor include at least partially overlapping fields of view, the first rasterized image includes a first representation of a third object, the second rasterized image includes a second representation of the third object, and the fused rasterized image includes a fused representation of the third object. 15. The system of claim 12 , wherein the operations further comprise: receiving fourth data representative of at least one probability distribution function corresponding to at least one pixel of at least one of the first rasterized image or the second rasterized image, the at least one pixel corresponding to at least one of the first object or the second object, and the at least one probability distribution function corresponding to one or more potential locations of an detected object, wherein the computing the third data is further based at least in part on the fourth data. 16. The system of claim 12 , wherein the operations further comprise: receiving fourth data representative of one or more velocity representations including encoded values corresponding to at least one of a velocity in an x-direction or a velocity in a y-direction, wherein the computing the third data is further based at least in part on the fourth data. 17. The system of claim 12 , wherein the system is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing deep learning operations; a system implemented using an edge device; a system implemented using a robot; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources. 18. A method comprising: receiving first data representative of at least a first rasterized image generated based at least in part on first sensor data generated using a first sensor of a first type, the first rasterized image including at least a first object; receiving second data representative of at least a second rasterized image generated based at least in part on second sensor data generated using a second sensor of a second type different from the first type, the second rasterized image including at a least a second object; computing, using a fusion deep neural network (DNN) and based at least in part
Supervised learning · CPC title
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.