Road profile along a predicted path
US-2016325753-A1 · Nov 10, 2016 · US
US11941875B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11941875-B2 |
| Application number | US-202117443674-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 27, 2021 |
| Priority date | Jul 27, 2020 |
| Publication date | Mar 26, 2024 |
| Grant date | Mar 26, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, computer systems, and apparatus, including computer programs encoded on computer storage media, for processing a perspective view range image generated from sensor measurements of an environment. The perspective view range image includes a plurality of pixels arranged in a two-dimensional grid and including, for each pixel, (i) features of one or more sensor measurements at a location in the environment corresponding to the pixel and (ii) geometry information comprising range features characterizing a range of the location in the environment corresponding to the pixel relative to the one or more sensors. The system processes the perspective view range image using a first neural network to generate an output feature representation. The first neural network comprises a first perspective point-set aggregation layer comprising a geometry-dependent kernel.
Opening claim text (preview).
What is claim is: 1. A method performed by one or more computers, the method comprising: obtaining a perspective view range image generated from sensor measurements of an environment by one or more sensors, the perspective view range image comprising a plurality of pixels arranged in a two-dimensional grid and including, for each pixel, (i) features of one or more sensor measurements at a location in the environment corresponding to the pixel and (ii) geometry information comprising range features characterizing a range of the location in the environment corresponding to the pixel relative to the one or more sensors; processing the perspective view range image using a first neural network to generate an output feature representation, wherein the first neural network comprises a first perspective point-set aggregation layer configured to: receive an input feature map, the input feature map comprising a respective feature vector for each of a first subset of the pixels; and generate an output feature map from the input feature map, wherein the output feature map comprises a respective output feature vector for each of the first subset of pixels, and wherein the generating comprises, for each particular pixel in the first subset, generating an initial output feature vector for the particular pixel by applying a geometry-dependent kernel to pixels within a local neighborhood of the particular pixel in the input feature map, wherein the geometry-dependent kernel depends on at least (i) respective input feature vectors for the pixels within the local neighborhood of the particular pixel in the input feature map and (ii) respective range features of the pixels within the local neighborhood of the input feature map; and processing the output feature representation using an output neural network to generate a network output for a neural network task. 2. The method of claim 1 , wherein the neural network task is object detection and the network output identifies portions of the environment where objects are located. 3. The method of claim 1 wherein the perspective view range image is generated from sensor measurements from a LiDAR sensor sweeping through the environment, wherein one dimension of the two-dimensional grid corresponds to beams of the LiDAR sensor and wherein the other dimension of the two-dimensional grid corresponds to regions of the environment swept through by the LiDAR sensor. 4. The method of claim 1 , wherein the perspective view range image is generated from sensor measurements from an RGBD camera. 5. The method of claim 1 , wherein the first neural network further comprises a two-dimensional convolutional layer that has a kernel that depends only on feature vectors and not on range features. 6. The method of claim 1 , further comprising: obtaining validity data that indicates, for each pixel in the range image, whether the sensor measurements for the pixel are valid; wherein the geometry-dependent kernel also depends on, for each pixel in the local neighborhood, whether the sensor measurements for the pixel are valid. 7. The method of claim 6 , wherein the geometry-dependent kernel is a geometry-dependent convolution kernel that, when generating the initial output feature vector for each particular pixel for which the sensor measurements are valid, applies different convolution weights to input feature vectors of pixels depending on a range difference between the pixel and the particular pixel as reflected by the geometry information. 8. The method of claim 7 , wherein the geometry-dependent kernel has k sets of convolution weights, wherein each of the k sets of convolution weights has a respective scalar range, and wherein the convolution weights that are applied to input feature vectors of a given pixel for which the sensor measurements are valid are a combination of sets of convolution weights having respective scalar ranges that are satisfied by the range difference between the pixel and the particular pixel. 9. The method of claim 6 , wherein: the geometry-dependent kernel is a self-attention kernel that applies a self-attention mechanism over the local neighborhood using queries, keys, and values for the pixels in the local neighborhood that are generated from the input feature vectors for the pixels in the local neighborhood, and for each pixel in the local neighborhood for which the sensor measurements are valid, at least the key for the pixel is augmented with a positional encoding that represents a relative location of the pixel to the particular pixel as reflected by the geometry information. 10. The method of claim 6 , wherein the geometry-dependent kernel is a kernel that: for each pixel in the local neighborhood for each particular pixel for which the sensor measurements are valid: for each pixel in the local neighborhood for which the sensor measurements are valid, processes (i) the input feature vector for the pixel in the local neighborhood and (ii) a positional encoding that represents a relative location of the pixel to the particular pixel as reflected by the respective geometry information for the pixel and the particular pixel using an encoder neural network to generate encoded features for the pixel; and applies max-pooling to the encoded features for the pixels in the local neighborhood to generate the initial output feature vector for the particular pixel. 11. The method of claim 6 , wherein the geometry-dependent kernel is a kernel that: for each pixel in the local neighborhood for each particular pixel for which the sensor measurements are valid: for each pixel in the local neighborhood for which the sensor measurements are valid, processes (i) the input feature vector for the pixel in the local neighborhood, (ii) a positional encoding that represents a relative location of the pixel to the particular pixel as reflected by the respective geometry information for the pixel and the particular pixel, and (iii) the input feature vector for the particular pixel using an encoder neural network to generate encoded features for the pixel; and applies max-pooling to the encoded features for the pixels in the local neighborhood to generate the initial output feature vector for the particular pixel. 12. The method of claim 1 , further comprising generating the input feature map by fusing camera image features with (i) features from the perspective view range image, (ii) output features generated by another neural network layer in the first neural network, or both. 13. A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising: obtaining a perspective view range image generated from sensor measurements of an environment by one or more sensors, the perspective view range image comprising a plurality of pixels arranged in a two-dimensional grid and including, for each pixel, (i) features of one or more sensor measurements at a location in the environment corresponding to the pixel and (ii) geometry information comprising range features characterizing a range of the location in the environment corresponding to the pixel relative to the one or more sensors; processing the perspective view range image using a first neural network to generate an output feature representation, wherein the first neural network comprises a first perspective point-set aggregation layer configured to: receive an input feature map, the input feature map comprising a respective feature vector for each of a first subset of the pixels; and generate an output feature map from the input feature map, wherein
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Scenes; Scene-specific elements (control of digital cameras H04N23/60) · CPC title
using analysis of echo signal for target characterisation; Target signature; Target cross-section · CPC title
for mapping or imaging · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.