Top-down object detection from LiDAR point clouds

US12164059B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12164059-B2
Application numberUS-202117377064-A
CountryUS
Kind codeB2
Filing dateJul 15, 2021
Priority dateNov 15, 2019
Publication dateDec 10, 2024
Grant dateDec 10, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A deep neural network(s) (DNN) may be used to detect objects from sensor data of a three dimensional (3D) environment. For example, a multi-view perception DNN may include multiple constituent DNNs or stages chained together that sequentially process different views of the 3D environment. An example DNN may include a first stage that performs class segmentation in a first view (e.g., perspective view) and a second stage that performs class segmentation and/or regresses instance geometry in a second view (e.g., top-down). The DNN outputs may be processed to generate 2D and/or 3D bounding boxes and class labels for detected objects in the 3D environment. As such, the techniques described herein may be used to detect and classify animate objects and/or parts of an environment, and these detections and classifications may be provided to an autonomous vehicle drive stack to enable safe planning and control of the autonomous vehicle.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: projecting, using one or more processors, a LiDAR point cloud that corresponds to an environment into a representation of height data of one or more points in the LiDAR point cloud; extracting, using the one or more processors, based at least on applying the representation of the height data and one or more view transformed per-pixel classifications corresponding to one or more pixels to one or more Neural Networks (NNs), class confidence data representing one or more classifications of one or more elements in the environment; generating, using the one or more processors, one or more bounding shapes of the one or more elements based at least on the class confidence data; and providing, using the one or more processors, data representing the one or more bounding shapes to a control component of an autonomous vehicle. 2. The method of claim 1 , wherein the projecting of the LiDAR point cloud into the representation of the height data comprises generating one or more height maps that represent the LiDAR point cloud from a top-down view, the one or more height maps having pixels storing minimum or maximum height values using one or more columns of the one or more points in the LiDAR point cloud. 3. The method of claim 1 , wherein the projecting of the LiDAR point cloud into the representation of the height data comprises collapsing the LiDAR point cloud into a height map that bins the one or more points from the LiDAR point cloud into one or more corresponding pixels of the height map; and for each of the one or more corresponding pixels of the height map corresponding to multiple points from the LiDAR point cloud, storing a height value sampled from the multiple points or determined using a statistical measure of the multiple points. 4. The method of claim 1 , further comprising: generating first class confidence data representing one or more initial classifications of an image of a first two-dimensional (2D) view of the environment; and generating the one or more view transformed per-pixel classifications representing the one or more initial classifications in a second 2D view of the environment based at least on transforming the one or more initial classifications from the first 2D view to the second 2D view. 5. The method of claim 1 , further comprising: generating first class confidence data representing one or more initial classifications of an image of a perspective view of the environment; and generating the one or more view transformed per-pixel classifications representing the one or more initial classifications in a top-down view of the environment based at least on projecting the one or more initial classifications from the perspective view to the top-down view; wherein the extracting of the class confidence data using the one or more NNs comprises applying the representation of the height data and the one or more view transformed per-pixel classifications to separate input channels of the one or more NNs. 6. The method of claim 1 , wherein the one or more NNs include a common trunk connected to: a first stream of layers configured to predict the class confidence data representing the one or more classifications of the one or more elements in the environment; and a second stream of layers configured to regress a location or dimension of a corresponding one of the one or more elements relative to each pixel. 7. The method of claim 1 , wherein the extracting of the class confidence data using the one or more NNs comprises applying, to separate input channels of the one or more NNs, the representation of the height data and the one or more view transformed per-pixel classifications representing one or more initial classifications transformed into a top-down view of the environment. 8. A processor comprising one or more circuits to: project a LiDAR point cloud captured from one or more LiDAR sensors of a vehicle in an environment into a representation of a height map corresponding to one or more points in the LiDAR point cloud; generate, using one or more Neural Networks (NNs) to process the representation of the height map and one or more view transformed per-pixel classifications corresponding to one or more pixels, classification data representing one or more classifications of objects or scenery in the environment; determine one or more bounding shapes of the objects or scenery based at least on the classification data; and output data representing the one or more bounding shapes to a control component of the vehicle. 9. The processor of claim 8 , wherein the height map represents the LiDAR point cloud from a top-down view and includes multiple channels that represent height values from one or more columns of the one or more points in the LiDAR point cloud in different ways. 10. The processor of claim 8 , the one or more circuits further to: generate the representation of the height map by collapsing the LiDAR point cloud to bin the one or more points from the LiDAR point cloud into one or more corresponding pixels of the height map; and for each of the one or more corresponding pixels of the height map that corresponds to multiple points from the LiDAR point cloud, store a height value sampled from the multiple points or determined using a statistical measure of the multiple points. 11. The processor of claim 8 , the one or more circuits further to: generate first classification data representing one or more initial per-pixel classifications of an image of a first two-dimensional (2D) view of the environment; and generate the one or more view transformed per-pixel classifications representing the one or more initial per-pixel classifications in a second 2D view of the environment based at least on projecting the one or more initial per-pixel classifications from the first 2D view to the second 2D view. 12. The processor of claim 8 , the one or more circuits further to: generate first classification data representing one or more initial per-pixel classifications of an image of a perspective view of the environment; and generate the one or more view transformed per-pixel classifications representing the one or more initial per-pixel classifications in a top-down view of the environment based at least on projecting the one or more initial per-pixel classifications from the perspective view to the top-down view; the one or more circuits further to generate the classification data based at least on feeding the representation of the height map and the one or more view transformed per-pixel classifications into separate input channels of the one or more NNs. 13. The processor of claim 8 , wherein the one or more NNs include a common trunk connected to: a first stream of layers configured to predict the classification data representing the one or more classifications of the objects or scenery in the environment; and a second stream of layers configured to regress a location or dimension of a corresponding one of the objects or scenery relative to each pixel. 14. The processor of claim 8 , the one or more circuits further to: generate accumulated LiDAR data by accumulating LiDAR data over a period of time from the one or more LiDAR sensors of the vehicle; and convert the accumulated LiDAR data to motion-compensated LiDAR data corresponding to a position of the vehicle at a particular time to generate the LiDAR point cloud. 15. A system comprising: one or more processing units; and one or more memory units storing instructions that, when executed by the one or more processing units, cause the one or more processing units to execute operati

Assignees

Inventors

Classifications

  • Supervised learning · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Handing over between on-board automatic and on-board manual control · CPC title

  • exterior to a vehicle by using sensors mounted on the vehicle · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12164059B2 cover?
A deep neural network(s) (DNN) may be used to detect objects from sensor data of a three dimensional (3D) environment. For example, a multi-view perception DNN may include multiple constituent DNNs or stages chained together that sequentially process different views of the 3D environment. An example DNN may include a first stage that performs class segmentation in a first view (e.g., perspectiv…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G01S7/4802. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 10 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).