Systems and methods for object classification in autonomous vehicles
US-2019026571-A1 · Jan 24, 2019 · US
US12164059B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12164059-B2 |
| Application number | US-202117377064-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 15, 2021 |
| Priority date | Nov 15, 2019 |
| Publication date | Dec 10, 2024 |
| Grant date | Dec 10, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A deep neural network(s) (DNN) may be used to detect objects from sensor data of a three dimensional (3D) environment. For example, a multi-view perception DNN may include multiple constituent DNNs or stages chained together that sequentially process different views of the 3D environment. An example DNN may include a first stage that performs class segmentation in a first view (e.g., perspective view) and a second stage that performs class segmentation and/or regresses instance geometry in a second view (e.g., top-down). The DNN outputs may be processed to generate 2D and/or 3D bounding boxes and class labels for detected objects in the 3D environment. As such, the techniques described herein may be used to detect and classify animate objects and/or parts of an environment, and these detections and classifications may be provided to an autonomous vehicle drive stack to enable safe planning and control of the autonomous vehicle.
Opening claim text (preview).
What is claimed is: 1. A method comprising: projecting, using one or more processors, a LiDAR point cloud that corresponds to an environment into a representation of height data of one or more points in the LiDAR point cloud; extracting, using the one or more processors, based at least on applying the representation of the height data and one or more view transformed per-pixel classifications corresponding to one or more pixels to one or more Neural Networks (NNs), class confidence data representing one or more classifications of one or more elements in the environment; generating, using the one or more processors, one or more bounding shapes of the one or more elements based at least on the class confidence data; and providing, using the one or more processors, data representing the one or more bounding shapes to a control component of an autonomous vehicle. 2. The method of claim 1 , wherein the projecting of the LiDAR point cloud into the representation of the height data comprises generating one or more height maps that represent the LiDAR point cloud from a top-down view, the one or more height maps having pixels storing minimum or maximum height values using one or more columns of the one or more points in the LiDAR point cloud. 3. The method of claim 1 , wherein the projecting of the LiDAR point cloud into the representation of the height data comprises collapsing the LiDAR point cloud into a height map that bins the one or more points from the LiDAR point cloud into one or more corresponding pixels of the height map; and for each of the one or more corresponding pixels of the height map corresponding to multiple points from the LiDAR point cloud, storing a height value sampled from the multiple points or determined using a statistical measure of the multiple points. 4. The method of claim 1 , further comprising: generating first class confidence data representing one or more initial classifications of an image of a first two-dimensional (2D) view of the environment; and generating the one or more view transformed per-pixel classifications representing the one or more initial classifications in a second 2D view of the environment based at least on transforming the one or more initial classifications from the first 2D view to the second 2D view. 5. The method of claim 1 , further comprising: generating first class confidence data representing one or more initial classifications of an image of a perspective view of the environment; and generating the one or more view transformed per-pixel classifications representing the one or more initial classifications in a top-down view of the environment based at least on projecting the one or more initial classifications from the perspective view to the top-down view; wherein the extracting of the class confidence data using the one or more NNs comprises applying the representation of the height data and the one or more view transformed per-pixel classifications to separate input channels of the one or more NNs. 6. The method of claim 1 , wherein the one or more NNs include a common trunk connected to: a first stream of layers configured to predict the class confidence data representing the one or more classifications of the one or more elements in the environment; and a second stream of layers configured to regress a location or dimension of a corresponding one of the one or more elements relative to each pixel. 7. The method of claim 1 , wherein the extracting of the class confidence data using the one or more NNs comprises applying, to separate input channels of the one or more NNs, the representation of the height data and the one or more view transformed per-pixel classifications representing one or more initial classifications transformed into a top-down view of the environment. 8. A processor comprising one or more circuits to: project a LiDAR point cloud captured from one or more LiDAR sensors of a vehicle in an environment into a representation of a height map corresponding to one or more points in the LiDAR point cloud; generate, using one or more Neural Networks (NNs) to process the representation of the height map and one or more view transformed per-pixel classifications corresponding to one or more pixels, classification data representing one or more classifications of objects or scenery in the environment; determine one or more bounding shapes of the objects or scenery based at least on the classification data; and output data representing the one or more bounding shapes to a control component of the vehicle. 9. The processor of claim 8 , wherein the height map represents the LiDAR point cloud from a top-down view and includes multiple channels that represent height values from one or more columns of the one or more points in the LiDAR point cloud in different ways. 10. The processor of claim 8 , the one or more circuits further to: generate the representation of the height map by collapsing the LiDAR point cloud to bin the one or more points from the LiDAR point cloud into one or more corresponding pixels of the height map; and for each of the one or more corresponding pixels of the height map that corresponds to multiple points from the LiDAR point cloud, store a height value sampled from the multiple points or determined using a statistical measure of the multiple points. 11. The processor of claim 8 , the one or more circuits further to: generate first classification data representing one or more initial per-pixel classifications of an image of a first two-dimensional (2D) view of the environment; and generate the one or more view transformed per-pixel classifications representing the one or more initial per-pixel classifications in a second 2D view of the environment based at least on projecting the one or more initial per-pixel classifications from the first 2D view to the second 2D view. 12. The processor of claim 8 , the one or more circuits further to: generate first classification data representing one or more initial per-pixel classifications of an image of a perspective view of the environment; and generate the one or more view transformed per-pixel classifications representing the one or more initial per-pixel classifications in a top-down view of the environment based at least on projecting the one or more initial per-pixel classifications from the perspective view to the top-down view; the one or more circuits further to generate the classification data based at least on feeding the representation of the height map and the one or more view transformed per-pixel classifications into separate input channels of the one or more NNs. 13. The processor of claim 8 , wherein the one or more NNs include a common trunk connected to: a first stream of layers configured to predict the classification data representing the one or more classifications of the objects or scenery in the environment; and a second stream of layers configured to regress a location or dimension of a corresponding one of the objects or scenery relative to each pixel. 14. The processor of claim 8 , the one or more circuits further to: generate accumulated LiDAR data by accumulating LiDAR data over a period of time from the one or more LiDAR sensors of the vehicle; and convert the accumulated LiDAR data to motion-compensated LiDAR data corresponding to a position of the vehicle at a particular time to generate the LiDAR point cloud. 15. A system comprising: one or more processing units; and one or more memory units storing instructions that, when executed by the one or more processing units, cause the one or more processing units to execute operati
Supervised learning · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Handing over between on-board automatic and on-board manual control · CPC title
exterior to a vehicle by using sensors mounted on the vehicle · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.