Methods for spatio-temporal scene-graph embedding for autonomous vehicle applications
US-2023230484-A1 · Jul 20, 2023 · US
US11899099B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11899099-B2 |
| Application number | US-201916698601-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 27, 2019 |
| Priority date | Nov 30, 2018 |
| Publication date | Feb 13, 2024 |
| Grant date | Feb 13, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed are techniques for fusing camera and radar frames to perform object detection in one or more spatial domains. In an aspect, an on-board computer of a host vehicle receives, from a camera sensor of the host vehicle, a plurality of camera frames, receives, from a radar sensor of the host vehicle, a plurality of radar frames, performs a camera feature extraction process on a first camera frame of the plurality of camera frames to generate a first camera feature map, performs a radar feature extraction process on a first radar frame of the plurality of radar frames to generate a first radar feature map, converts the first camera feature map and/or the first radar feature map to a common spatial domain, and concatenates the first radar feature map and the first camera feature map to generate a first concatenated feature map in the common spatial domain.
Opening claim text (preview).
What is claimed is: 1. A method of performing early fusion of camera and radar frames to perform object detection in one or more spatial domains performed by an on-board computer of a host vehicle, comprising: receiving, from a camera sensor of the host vehicle, a plurality of camera frames; receiving, from a radar sensor of the host vehicle, a plurality of radar frames; performing a camera feature extraction process on a first camera frame of the plurality of camera frames to generate a first camera feature map; performing a radar feature extraction process on a first radar frame of the plurality of radar frames to generate a first radar feature map, wherein the first radar frame corresponds in time to the first camera frame; converting the first camera feature map, the first radar feature map, or both to a common spatial domain; concatenating the first radar feature map and the first camera feature map to generate a first concatenated feature map in the common spatial domain; performing object detection on the first concatenated feature map to detect one or more objects in the first concatenated feature map without performing object detection on the first camera feature map or the first radar feature map; and estimating a width, length, or both of the one or more objects in a bird's eye view, after inverse perspective mapping, based on a bounding box in the first camera frame encapsulating each of the one or more objects. 2. The method of claim 1 , wherein the common spatial domain is a spatial domain of the radar sensor. 3. The method of claim 1 , wherein: converting the first camera feature map, the first radar feature map, or both to the common spatial domain comprises converting the first camera feature map to the common spatial domain, and converting the first camera feature map to the common spatial domain comprises performing an explicit inverse perspective mapping transformation on the first camera feature map. 4. The method of claim 1 , wherein: converting the first camera feature map, the first radar feature map, or both to the common spatial domain comprises converting the first camera feature map to the common spatial domain, and converting the first camera feature map to the common spatial domain occurs during performing the camera feature extraction process. 5. The method of claim 1 , further comprising: hashing a plurality of blocks of the first camera frame to identify one or more blocks that have not changed between a previous camera frame of the plurality of camera frames and the first camera frame; and copying feature map values of a second camera feature map of the previous camera frame to corresponding feature map values of the first feature map. 6. The method of claim 1 , wherein the width, length, or both of the one or more objects is estimated based at least in part on a make, model, or both of the one or more objects. 7. The method of claim 1 , further comprising: performing the camera feature extraction process on a second camera frame of the plurality of camera frames to generate a second camera feature map; performing the radar feature extraction process on a second radar frame of the plurality of radar frames to generate a second radar feature map; converting the second camera feature map to the common spatial domain to generate a converted camera feature map, the second radar feature map to the common spatial domain to generate a converted radar feature map, or both; and concatenating the converted second radar feature map, the converted second camera feature map, or both to generate a second concatenated feature map, wherein detecting the one or more objects is further based on the second concatenated feature map. 8. The method of claim 1 , wherein the radar sensor and the camera sensor are collocated in a shared housing in the host vehicle. 9. The method of claim 1 , further comprising: performing an autonomous driving operation based on detecting the one or more objects. 10. The method of claim 9 , wherein the autonomous driving operation is one or more of braking, accelerating, steering, adjusting a cruise control setting, or signaling. 11. A method of performing early fusion of camera and radar frames to perform object detection in one or more spatial domains performed by an on-board computer of a host vehicle, comprising: receiving, from a camera sensor of the host vehicle, a plurality of camera frames; receiving, from a radar sensor of the host vehicle, a plurality of radar frames; applying an encoder-decoder network on the first camera frame to generate a first camera feature map in a spatial domain of a first radar frame, wherein the first radar frame corresponds in time to the first camera frame; combining the first radar frame and the first camera feature map to generate a first combined feature map in the spatial domain of the first radar frame; performing object detection on the first combined feature map to detect one or more objects in the first combined feature map without performing object detection on the first camera feature map or the first radar frame; and estimating a width, length, or both of the one or more objects in a bird's eye view, after inverse perspective mapping, based on a bounding box in the first camera frame encapsulating each of the one or more object. 12. The method of claim 11 , further comprising: providing the first combined feature map to a neural network. 13. The method of claim 11 , further comprising: performing an autonomous driving operation based on detecting the one or more objects. 14. The method of claim 13 , wherein the autonomous driving operation is one or more of braking, accelerating, steering, adjusting a cruise control setting, or signaling. 15. An on-board computer of a host vehicle, comprising: at least one processor configured to: receive, from a camera sensor of the host vehicle, a plurality of camera frames; receive, from a radar sensor of the host vehicle, a plurality of radar frames; perform a camera feature extraction process on a first camera frame of the plurality of camera frames to generate a first camera feature map; perform a radar feature extraction process on a first radar frame of the plurality of radar frames to generate a first radar feature map, wherein the first radar frame corresponds in time to the first camera frame; convert the first camera feature map, the first radar feature map, or both to a common spatial domain; concatenate the first radar feature map and the first camera feature map to generate a first concatenated feature map in the common spatial domain; perform object detection on the first concatenated feature map to detect one or more objects in the first concatenated feature map without performance of object detection on the first camera feature map or the first radar feature map; and estimate a width, length, or both of the one or more objects in a bird's eye view, after inverse perspective mapping, based on a bounding box in the first camera frame encapsulating each of the one or more objects. 16. The on-board computer of claim 15 , wherein the common spatial domain is a spatial domain of the radar sensor. 17. The on-board computer of claim 15 , wherein: the at least one processor being configured to convert the first camera feature map, the first radar feature map, or both to the common spatial domain comprises converting the first camera feature map to the common spatial domain, and the at least one processor being configured to convert the first camera feature map to the common spatial domai
Coherent light, e.g. laser signals · CPC title
Radio signals · CPC title
from positioning sensors located off-board the vehicle, e.g. from cameras · CPC title
using signals provided by artificial sources external to the vehicle, e.g. navigation beacons · CPC title
Command input arrangements located on-board unmanned vehicles · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.