What technology area does this patent fall under?

Primary CPC classification G06V20/00. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 26 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Processing perspective view range images using neural networks

US11941875B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11941875-B2
Application number	US-202117443674-A
Country	US
Kind code	B2
Filing date	Jul 27, 2021
Priority date	Jul 27, 2020
Publication date	Mar 26, 2024
Grant date	Mar 26, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, computer systems, and apparatus, including computer programs encoded on computer storage media, for processing a perspective view range image generated from sensor measurements of an environment. The perspective view range image includes a plurality of pixels arranged in a two-dimensional grid and including, for each pixel, (i) features of one or more sensor measurements at a location in the environment corresponding to the pixel and (ii) geometry information comprising range features characterizing a range of the location in the environment corresponding to the pixel relative to the one or more sensors. The system processes the perspective view range image using a first neural network to generate an output feature representation. The first neural network comprises a first perspective point-set aggregation layer comprising a geometry-dependent kernel.

First claim

Opening claim text (preview).

What is claim is: 1. A method performed by one or more computers, the method comprising: obtaining a perspective view range image generated from sensor measurements of an environment by one or more sensors, the perspective view range image comprising a plurality of pixels arranged in a two-dimensional grid and including, for each pixel, (i) features of one or more sensor measurements at a location in the environment corresponding to the pixel and (ii) geometry information comprising range features characterizing a range of the location in the environment corresponding to the pixel relative to the one or more sensors; processing the perspective view range image using a first neural network to generate an output feature representation, wherein the first neural network comprises a first perspective point-set aggregation layer configured to: receive an input feature map, the input feature map comprising a respective feature vector for each of a first subset of the pixels; and generate an output feature map from the input feature map, wherein the output feature map comprises a respective output feature vector for each of the first subset of pixels, and wherein the generating comprises, for each particular pixel in the first subset, generating an initial output feature vector for the particular pixel by applying a geometry-dependent kernel to pixels within a local neighborhood of the particular pixel in the input feature map, wherein the geometry-dependent kernel depends on at least (i) respective input feature vectors for the pixels within the local neighborhood of the particular pixel in the input feature map and (ii) respective range features of the pixels within the local neighborhood of the input feature map; and processing the output feature representation using an output neural network to generate a network output for a neural network task. 2. The method of claim 1 , wherein the neural network task is object detection and the network output identifies portions of the environment where objects are located. 3. The method of claim 1 wherein the perspective view range image is generated from sensor measurements from a LiDAR sensor sweeping through the environment, wherein one dimension of the two-dimensional grid corresponds to beams of the LiDAR sensor and wherein the other dimension of the two-dimensional grid corresponds to regions of the environment swept through by the LiDAR sensor. 4. The method of claim 1 , wherein the perspective view range image is generated from sensor measurements from an RGBD camera. 5. The method of claim 1 , wherein the first neural network further comprises a two-dimensional convolutional layer that has a kernel that depends only on feature vectors and not on range features. 6. The method of claim 1 , further comprising: obtaining validity data that indicates, for each pixel in the range image, whether the sensor measurements for the pixel are valid; wherein the geometry-dependent kernel also depends on, for each pixel in the local neighborhood, whether the sensor measurements for the pixel are valid. 7. The method of claim 6 , wherein the geometry-dependent kernel is a geometry-dependent convolution kernel that, when generating the initial output feature vector for each particular pixel for which the sensor measurements are valid, applies different convolution weights to input feature vectors of pixels depending on a range difference between the pixel and the particular pixel as reflected by the geometry information. 8. The method of claim 7 , wherein the geometry-dependent kernel has k sets of convolution weights, wherein each of the k sets of convolution weights has a respective scalar range, and wherein the convolution weights that are applied to input feature vectors of a given pixel for which the sensor measurements are valid are a combination of sets of convolution weights having respective scalar ranges that are satisfied by the range difference between the pixel and the particular pixel. 9. The method of claim 6 , wherein: the geometry-dependent kernel is a self-attention kernel that applies a self-attention mechanism over the local neighborhood using queries, keys, and values for the pixels in the local neighborhood that are generated from the input feature vectors for the pixels in the local neighborhood, and for each pixel in the local neighborhood for which the sensor measurements are valid, at least the key for the pixel is augmented with a positional encoding that represents a relative location of the pixel to the particular pixel as reflected by the geometry information. 10. The method of claim 6 , wherein the geometry-dependent kernel is a kernel that: for each pixel in the local neighborhood for each particular pixel for which the sensor measurements are valid: for each pixel in the local neighborhood for which the sensor measurements are valid, processes (i) the input feature vector for the pixel in the local neighborhood and (ii) a positional encoding that represents a relative location of the pixel to the particular pixel as reflected by the respective geometry information for the pixel and the particular pixel using an encoder neural network to generate encoded features for the pixel; and applies max-pooling to the encoded features for the pixels in the local neighborhood to generate the initial output feature vector for the particular pixel. 11. The method of claim 6 , wherein the geometry-dependent kernel is a kernel that: for each pixel in the local neighborhood for each particular pixel for which the sensor measurements are valid: for each pixel in the local neighborhood for which the sensor measurements are valid, processes (i) the input feature vector for the pixel in the local neighborhood, (ii) a positional encoding that represents a relative location of the pixel to the particular pixel as reflected by the respective geometry information for the pixel and the particular pixel, and (iii) the input feature vector for the particular pixel using an encoder neural network to generate encoded features for the pixel; and applies max-pooling to the encoded features for the pixels in the local neighborhood to generate the initial output feature vector for the particular pixel. 12. The method of claim 1 , further comprising generating the input feature map by fusing camera image features with (i) features from the perspective view range image, (ii) output features generated by another neural network layer in the first neural network, or both. 13. A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising: obtaining a perspective view range image generated from sensor measurements of an environment by one or more sensors, the perspective view range image comprising a plurality of pixels arranged in a two-dimensional grid and including, for each pixel, (i) features of one or more sensor measurements at a location in the environment corresponding to the pixel and (ii) geometry information comprising range features characterizing a range of the location in the environment corresponding to the pixel relative to the one or more sensors; processing the perspective view range image using a first neural network to generate an output feature representation, wherein the first neural network comprises a first perspective point-set aggregation layer configured to: receive an input feature map, the input feature map comprising a respective feature vector for each of a first subset of the pixels; and generate an output feature map from the input feature map, wherein

Assignees

Waymo Llc

Inventors

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/09
Supervised learning · CPC title
G06V20/00Primary
Scenes; Scene-specific elements (control of digital cameras H04N23/60) · CPC title
G01S7/4802
using analysis of echo signal for target characterisation; Target signature; Target cross-section · CPC title
G01S17/89
for mapping or imaging · CPC title

Patent family

Related publications grouped by family.

View patent family 80036157

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11941875B2 cover?: Methods, computer systems, and apparatus, including computer programs encoded on computer storage media, for processing a perspective view range image generated from sensor measurements of an environment. The perspective view range image includes a plurality of pixels arranged in a two-dimensional grid and including, for each pixel, (i) features of one or more sensor measurements at a location …
Who is the assignee on this patent?: Waymo Llc
What technology area does this patent fall under?: Primary CPC classification G06V20/00. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 26 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).