What technology area does this patent fall under?

Primary CPC classification G01S7/4802. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 20 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Multi-view deep neural network for LiDAR perception

US11532168B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11532168-B2
Application number	US-202016915346-A
Country	US
Kind code	B2
Filing date	Jun 29, 2020
Priority date	Nov 15, 2019
Publication date	Dec 20, 2022
Grant date	Dec 20, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A deep neural network(s) (DNN) may be used to detect objects from sensor data of a three dimensional (3D) environment. For example, a multi-view perception DNN may include multiple constituent DNNs or stages chained together that sequentially process different views of the 3D environment. An example DNN may include a first stage that performs class segmentation in a first view (e.g., perspective view) and a second stage that performs class segmentation and/or regresses instance geometry in a second view (e.g., top-down). The DNN outputs may be processed to generate 2D and/or 3D bounding boxes and class labels for detected objects in the 3D environment. As such, the techniques described herein may be used to detect and classify animate objects and/or parts of an environment, and these detections and classifications may be provided to an autonomous vehicle drive stack to enable safe planning and control of the autonomous vehicle.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: converting accumulated sensor data to motion-compensated sensor data corresponding to a position of an ego-actor at a particular time; projecting the motion-compensated sensor data into two-dimensional (2D) image-space to generate first data representing a first 2D view of an environment; extracting, using one or more Neural Networks (NNs), classification data representing one or more classifications of objects or scenery depicted in the first 2D view based at least on the first data; generating transformed classification data representing the one or more classifications in a second 2D view of the environment based at least on projecting the one or more classifications from the first 2D view to the second 2D view; and generating, using the one or more NNs, second data representing one or more bounding shapes of one or more objects detected in the environment based at least on the transformed classification data. 2. The method of claim 1 , wherein the first 2D view is a perspective view and the second 2D view is a top-down view. 3. The method of claim 1 , wherein the first data representing the first 2D view of the environment comprises a projection of a LiDAR point cloud, the projection representing a perspective view of the environment, and wherein the projecting of the one or more classifications from the first 2D view to the second 2D view comprises using the LiDAR point cloud to project the one or more classifications from the perspective view to a top-down view of the environment. 4. The method of claim 1 , wherein the first data represents a LiDAR range image of the first 2D view, and the determining of the first data comprises projecting a LiDAR point cloud into the LiDAR range image. 5. The method of claim 1 , wherein the first 2D view is a LiDAR range image having a height in pixels corresponding to a number of horizontal scan lines of a LiDAR sensor that captured the accumulated sensor data. 6. The method of claim 1 , wherein the accumulated sensor data comprises sensor data from one or more LiDAR sensors of the ego-actor accumulated over a period of time, and the first 2D view is a LiDAR range image of the environment. 7. The method of claim 1 , wherein the projecting of the one or more classifications from the first 2D view to the second 2D view comprises applying a differentiable transformation to 3D locations associated with the classification data. 8. The method of claim 1 , wherein the accumulated sensor data represents a LiDAR point cloud, wherein the transformed classification data represents one or more confidence maps in the second 2D view, and the method further comprises: generating third data representing one or more height maps based at least on projecting the LiDAR point cloud into the second 2D view; forming a tensor comprising a first set of one or more channels storing the transformed classification data representing the one or more confidence maps and a second set of one or more channels storing the third data representing the one or more height maps; and extracting, from the tensor using the one or more NNs, second classification data representing one or more second classifications in the second 2D view and fourth data representing object instance geometry of the one or more objects. 9. The method of claim 1 , further comprising: decoding an output of one or more NNs to produce candidate bounding shapes for the one or more objects; identifying the second data representing the one or more bounding shapes for the one or more objects based on performing at least one of filtering or clustering of the candidate bounding shapes to remove duplicate candidates from the candidate bounding shapes; and assigning a class label for each of the one or more bounding shapes based on the output of the one or more NNs. 10. The method of claim 1 , wherein the determining of the second data representing the one or more bounding shapes comprises: decoding an output of the one or more NNs to produce candidate bounding shapes for the one or more objects; and identifying the second data representing the one or more bounding shapes for the one or more objects based on performing at least one of non-maximum suppression or density-based spatial clustering of applications with noise to remove duplicate candidates from the candidate bounding shapes. 11. The method of claim 1 , wherein an output of the one or more NNs comprises a tensor storing regressed geometry data for each detected object, wherein the determining of the second data representing the one or more bounding shapes comprises generating one or more 3D bounding shapes for the one or more objects from the regressed geometry data. 12. The method of claim 1 , further comprising training the one or more NNs using training data generated using annotation tracking to track an annotated object between two or more frames of corresponding sensor data. 13. The method of claim 1 , further comprising training the one or more NNs using training data generated using a link between object tracks generated for a particular object from corresponding sensor data from two or more sensors. 14. A method comprising: receiving LiDAR data from one or more LiDAR sensors in an environment; generating, from the LiDAR data, first data representing a perspective view of the environment; generating, using one or more Neural Networks (NNs), classification data from the first data, the classification data representing one or more classifications in the perspective view; generating transformed classification data representing the one or more classifications in a top-down view of the environment by projecting the one or more classifications in the perspective view into the top-down view using the LiDAR data; and generating, using the one or more NNs, second data representing one or more bounding shapes of one or more objects detected in the environment based at least on the transformed classification data in the top-down view. 15. The method of claim 14 , wherein the generating of the first data representing the perspective view of the environment comprises: accessing accumulated sensor data, from the one or more LiDAR sensors of an ego-actor, accumulated over a period of time; converting the accumulated sensor data to motion-compensated sensor data corresponding to a position of the ego-actor at a particular time; and projecting the motion-compensated sensor data into two-dimensional (2D) image-space to generate the first data representing a LiDAR range image of the perspective view of the environment. 16. The method of claim 14 , wherein the one or more NNs includes a first stage configured to evaluate the first data representing the perspective view and a second stage configured to evaluate the transformed classification data representing the top-down view. 17. The method of claim 14 , wherein the second data further represents a class label for each of the one or more bounding shapes the one or more objects. 18. A method comprising: generating, using one or more neural networks (NNs), classification data representing one or more classifications of first two-dimensional (2D) points in a first 2D view of an environment; associating the one or more classifications of the first 2D points with corresponding three-dimensional (3D) locations of corresponding sensor data; projecting the one or more classifications of the first 2D points from the corresponding 3D locations to second 2D points in a second 2D view of the environment to generate transformed classificati

Assignees

Nvidia Corp

Inventors

Classifications

G06V10/774
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
G06V10/26
Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion · CPC title
G06V10/803
of input or preprocessed data · CPC title
G06V10/16
using multiple overlapping images; Image stitching · CPC title
G01S7/4802Primary
using analysis of echo signal for target characterisation; Target signature; Target cross-section · CPC title

Patent family

Related publications grouped by family.

View patent family 73288488

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11532168B2 cover?: A deep neural network(s) (DNN) may be used to detect objects from sensor data of a three dimensional (3D) environment. For example, a multi-view perception DNN may include multiple constituent DNNs or stages chained together that sequentially process different views of the 3D environment. An example DNN may include a first stage that performs class segmentation in a first view (e.g., perspectiv…
Who is the assignee on this patent?: Nvidia Corp
What technology area does this patent fall under?: Primary CPC classification G01S7/4802. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 20 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).