Systems and Methods for End-to-End Trajectory Prediction Using Radar, Lidar, and Maps
US-2022035376-A1 · Feb 3, 2022 · US
US2023260266A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2023260266-A1 |
| Application number | US-202318108749-A |
| Country | US |
| Kind code | A1 |
| Filing date | Feb 13, 2023 |
| Priority date | Feb 15, 2022 |
| Publication date | Aug 17, 2023 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method includes obtaining, by a processing device, input data derived from a set of sensors associated with an autonomous vehicle (AV), extracting, by the processing device from the input data, a plurality of sets of features, generating, by the processing device using the plurality of sets of features, a fused bird's-eye view (BEV) grid. The fused BEV grid is generated based on a first BEV grid having a first scale and a second BEV grid having a second scale different from the first scale. The method further includes providing, by the processing device, the fused BEV grid for object detection.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: obtaining, by a processing device, input data derived from a set of sensors associated with an autonomous vehicle (AV); extracting, by the processing device from the input data, a plurality of sets of features; generating, by the processing device using the plurality of sets of features, a fused bird's-eye view (BEV) grid, wherein the fused BEV grid is generated based on a first BEV grid having a first scale and a second BEV grid having a second scale different from the first scale; and providing, by the processing device, the fused BEV grid for object detection. 2 . The method of claim 1 , wherein: the set of sensors comprises at least one camera and at least one radar; the input data comprises a set of camera data obtained from the at least one camera and a set of radar data obtained from the at least one radar; and the plurality of sets of features comprises a set of camera data features generated from the set of camera data and a set of radar data features generated from the set of radar data. 3 . The method of claim 1 , wherein generating the fused BEV grid further comprises: associating each set of features of the plurality of sets of features with a respective set of points; generating, using each set of points, a set of BEV grids, the set of BEV grids comprising the first BEV grid and the second BEV grid; extracting, for each BEV grid of the set of BEV grids, a respective set of BEV grid features; generating, for each BEV grid of the set of BEV grids using the respective set of BEV grid features, a resampled BEV grid, wherein the first BEV grid is associated with a first resampled BEV grid and wherein the second BEV grid is associated with a second resampled BEV grid; and fusing each resampled BEV grid to generate the fused BEV grid. 4 . The method of claim 3 , wherein associating each set of features of the plurality of sets of features with a respective set of points further comprises: transforming a set of camera features of the plurality of sets of features into a set of pixel points; and transforming a set of radar features of the plurality of sets of features into a set of radar points, including transforming from a polar coordinate representation to a Cartesian coordinate representation. 5 . The method of claim 1 , further comprising performing, by the processing device using the fused BEV grid, the object detection to identify at least one object using a set of neural networks. 6 . The method of claim 5 , wherein performing object detection further comprises: obtaining a set of predictions generated using the fused BEV grid, wherein the set of predictions comprises a heatmap prediction and an attribute prediction; generating, from the set of predictions, a set of candidate bounding boxes, each candidate bounding box of the set of candidate bounding boxes corresponding to the at least one object; and selecting, from the set of candidate bounding boxes, at least one bounding box corresponding to the at least one object. 7 . The method of claim 5 , further comprising causing, by the processing device, a driving path of the AV to be modified in view of the at least one object. 8 . A system comprising: a memory; and a processing device communicative coupled to the memory, the processing device configured to: obtain input data derived from a set of sensors associated with an autonomous vehicle (AV); extract, from the input data, a plurality of sets of features; generate, using the plurality of sets of features, a fused bird's-eye view (BEV) grid, wherein the fused BEV grid is generated based on a first BEV grid having a first scale and a second BEV grid having a second scale different from the first scale; and provide the fused BEV grid for object detection. 9 . The system of claim 8 , wherein: the set of sensors comprises at least one camera and at least one radar; the input data comprises a set of camera data obtained from the at least one camera and a set of radar data obtained from the at least one radar; and the plurality of sets of features comprises a set of camera data features generated from the set of camera data and a set of radar data features generated from the set of radar data. 10 . The system of claim 8 , wherein, to generate the fused BEV grid, the processing device is further configured to: associate each set of features of the plurality of sets of features with a respective set of points; generate, using each set of points, a set of BEV grids, the set of BEV grids comprising the first BEV grid and the second BEV grid; extract, for each BEV grid of the set of BEV grids, a respective set of BEV grid features; generate, for each BEV grid of the set of BEV grids using the respective set of BEV grid features, a resampled BEV grid, wherein the first BEV grid is associated with a first resampled BEV grid and wherein the second BEV grid is associated with a second resampled BEV grid; and fuse each resampled BEV grid to generate the fused BEV grid. 11 . The system of claim 10 , wherein, to associate each set of features of the plurality of sets of features with a respective set of points, the processing device is further configured to: transform a set of camera features of the plurality of sets of features into a set of pixel points; and transform a set of radar features of the plurality of sets of features into a set of radar points by transforming from a polar coordinate representation to a Cartesian coordinate representation. 12 . The system of claim 8 , wherein the processing device is further configured to perform, using the fused BEV grid, the object detection to identify at least one object using a set of neural networks. 13 . The system of claim 12 , wherein, to perform object detection, the processing device is further configured to: obtain a set of predictions generated using the fused BEV grid, wherein the set of predictions comprises a heatmap prediction and an attribute prediction; generate, from the set of predictions, a set of candidate bounding boxes, each candidate bounding box of the set of candidate bounding boxes corresponding to the at least one object; and select, from the set of candidate bounding boxes, at least one bounding box corresponding to the at least one object. 14 . The system of claim 12 , wherein the processing device is further configured to cause a driving path of the AV to be modified in view of the at least one object. 15 . A non-transitory computer-readable storage medium having instructions stored thereon that, when executed by a processing device, cause the processing device to perform operations comprising: obtaining input data derived from a set of sensors associated with an autonomous vehicle (AV), wherein the set of sensors comprises at least one camera and at least one radar, and wherein the input data comprises a set of camera data obtained from the at least one camera and a set of radar data obtained from the at least one radar; extracting, from the input data, a plurality of sets of features, wherein the plurality of sets of features comprises a set of camera data features generated from the set of camera data and a set of radar data features generated from the set of radar data; generating, using the plurality of sets of features, a fused bird's-eye view (BEV) grid, wherein the fused BEV grid is generated based on a first BEV grid having a first scale and a second BEV grid having a second scale different from the first scale; and providing the fused BEV grid for object detection. 16 . T
Image mosaicing, e.g. composing plane images from plane sub-images · CPC title
adapted for simultaneous range and velocity measurements · CPC title
Combination of radar systems with cameras · CPC title
of land vehicles · CPC title
involving the use of neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.