Hybrid-view LIDAR-based object detection
US-10809361-B2 · Oct 20, 2020 · US
US11532168B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11532168-B2 |
| Application number | US-202016915346-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 29, 2020 |
| Priority date | Nov 15, 2019 |
| Publication date | Dec 20, 2022 |
| Grant date | Dec 20, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A deep neural network(s) (DNN) may be used to detect objects from sensor data of a three dimensional (3D) environment. For example, a multi-view perception DNN may include multiple constituent DNNs or stages chained together that sequentially process different views of the 3D environment. An example DNN may include a first stage that performs class segmentation in a first view (e.g., perspective view) and a second stage that performs class segmentation and/or regresses instance geometry in a second view (e.g., top-down). The DNN outputs may be processed to generate 2D and/or 3D bounding boxes and class labels for detected objects in the 3D environment. As such, the techniques described herein may be used to detect and classify animate objects and/or parts of an environment, and these detections and classifications may be provided to an autonomous vehicle drive stack to enable safe planning and control of the autonomous vehicle.
Opening claim text (preview).
What is claimed is: 1. A method comprising: converting accumulated sensor data to motion-compensated sensor data corresponding to a position of an ego-actor at a particular time; projecting the motion-compensated sensor data into two-dimensional (2D) image-space to generate first data representing a first 2D view of an environment; extracting, using one or more Neural Networks (NNs), classification data representing one or more classifications of objects or scenery depicted in the first 2D view based at least on the first data; generating transformed classification data representing the one or more classifications in a second 2D view of the environment based at least on projecting the one or more classifications from the first 2D view to the second 2D view; and generating, using the one or more NNs, second data representing one or more bounding shapes of one or more objects detected in the environment based at least on the transformed classification data. 2. The method of claim 1 , wherein the first 2D view is a perspective view and the second 2D view is a top-down view. 3. The method of claim 1 , wherein the first data representing the first 2D view of the environment comprises a projection of a LiDAR point cloud, the projection representing a perspective view of the environment, and wherein the projecting of the one or more classifications from the first 2D view to the second 2D view comprises using the LiDAR point cloud to project the one or more classifications from the perspective view to a top-down view of the environment. 4. The method of claim 1 , wherein the first data represents a LiDAR range image of the first 2D view, and the determining of the first data comprises projecting a LiDAR point cloud into the LiDAR range image. 5. The method of claim 1 , wherein the first 2D view is a LiDAR range image having a height in pixels corresponding to a number of horizontal scan lines of a LiDAR sensor that captured the accumulated sensor data. 6. The method of claim 1 , wherein the accumulated sensor data comprises sensor data from one or more LiDAR sensors of the ego-actor accumulated over a period of time, and the first 2D view is a LiDAR range image of the environment. 7. The method of claim 1 , wherein the projecting of the one or more classifications from the first 2D view to the second 2D view comprises applying a differentiable transformation to 3D locations associated with the classification data. 8. The method of claim 1 , wherein the accumulated sensor data represents a LiDAR point cloud, wherein the transformed classification data represents one or more confidence maps in the second 2D view, and the method further comprises: generating third data representing one or more height maps based at least on projecting the LiDAR point cloud into the second 2D view; forming a tensor comprising a first set of one or more channels storing the transformed classification data representing the one or more confidence maps and a second set of one or more channels storing the third data representing the one or more height maps; and extracting, from the tensor using the one or more NNs, second classification data representing one or more second classifications in the second 2D view and fourth data representing object instance geometry of the one or more objects. 9. The method of claim 1 , further comprising: decoding an output of one or more NNs to produce candidate bounding shapes for the one or more objects; identifying the second data representing the one or more bounding shapes for the one or more objects based on performing at least one of filtering or clustering of the candidate bounding shapes to remove duplicate candidates from the candidate bounding shapes; and assigning a class label for each of the one or more bounding shapes based on the output of the one or more NNs. 10. The method of claim 1 , wherein the determining of the second data representing the one or more bounding shapes comprises: decoding an output of the one or more NNs to produce candidate bounding shapes for the one or more objects; and identifying the second data representing the one or more bounding shapes for the one or more objects based on performing at least one of non-maximum suppression or density-based spatial clustering of applications with noise to remove duplicate candidates from the candidate bounding shapes. 11. The method of claim 1 , wherein an output of the one or more NNs comprises a tensor storing regressed geometry data for each detected object, wherein the determining of the second data representing the one or more bounding shapes comprises generating one or more 3D bounding shapes for the one or more objects from the regressed geometry data. 12. The method of claim 1 , further comprising training the one or more NNs using training data generated using annotation tracking to track an annotated object between two or more frames of corresponding sensor data. 13. The method of claim 1 , further comprising training the one or more NNs using training data generated using a link between object tracks generated for a particular object from corresponding sensor data from two or more sensors. 14. A method comprising: receiving LiDAR data from one or more LiDAR sensors in an environment; generating, from the LiDAR data, first data representing a perspective view of the environment; generating, using one or more Neural Networks (NNs), classification data from the first data, the classification data representing one or more classifications in the perspective view; generating transformed classification data representing the one or more classifications in a top-down view of the environment by projecting the one or more classifications in the perspective view into the top-down view using the LiDAR data; and generating, using the one or more NNs, second data representing one or more bounding shapes of one or more objects detected in the environment based at least on the transformed classification data in the top-down view. 15. The method of claim 14 , wherein the generating of the first data representing the perspective view of the environment comprises: accessing accumulated sensor data, from the one or more LiDAR sensors of an ego-actor, accumulated over a period of time; converting the accumulated sensor data to motion-compensated sensor data corresponding to a position of the ego-actor at a particular time; and projecting the motion-compensated sensor data into two-dimensional (2D) image-space to generate the first data representing a LiDAR range image of the perspective view of the environment. 16. The method of claim 14 , wherein the one or more NNs includes a first stage configured to evaluate the first data representing the perspective view and a second stage configured to evaluate the transformed classification data representing the top-down view. 17. The method of claim 14 , wherein the second data further represents a class label for each of the one or more bounding shapes the one or more objects. 18. A method comprising: generating, using one or more neural networks (NNs), classification data representing one or more classifications of first two-dimensional (2D) points in a first 2D view of an environment; associating the one or more classifications of the first 2D points with corresponding three-dimensional (3D) locations of corresponding sensor data; projecting the one or more classifications of the first 2D points from the corresponding 3D locations to second 2D points in a second 2D view of the environment to generate transformed classificati
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion · CPC title
of input or preprocessed data · CPC title
using multiple overlapping images; Image stitching · CPC title
using analysis of echo signal for target characterisation; Target signature; Target cross-section · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.