What technology area does this patent fall under?

Primary CPC classification G06T7/73. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 06 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Voxel-based feature learning network

US10970518B1 · US · B1

Patent metadata
Field	Value
Publication number	US-10970518-B1
Application number	US-201816188879-A
Country	US
Kind code	B1
Filing date	Nov 13, 2018
Priority date	Nov 14, 2017
Publication date	Apr 6, 2021
Grant date	Apr 6, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A voxel feature learning network receives a raw point cloud and converts the point cloud into a sparse 4D tensor comprising three-dimensional coordinates (e.g. X, Y, and Z) for each voxel of a plurality of voxels and a fourth voxel feature dimension for each non-empty voxel. In some embodiments, convolutional mid layers further transform the 4D tensor into a high-dimensional volumetric representation of the point cloud. In some embodiments, a region proposal network identifies 3D bounding boxes of objects in the point cloud based on the high-dimensional volumetric representation. In some embodiments, the feature learning network and the region proposal network are trained end-to-end using training data comprising known ground truth bounding boxes, without requiring human intervention.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising one or more processors and a memory coupled to the one or more processors, wherein the memory comprises program instructions configured to: implement, via the one or more processors, a voxel feature learning network, wherein the voxel feature learning network is configured to: receive a point cloud comprising a plurality of points located in a three-dimensional space; group respective sets of the points of the point cloud into respective voxels, wherein the respective points are grouped into the respective voxels based on locations of the respective points in the three-dimensional space and locations of the voxels in the three-dimensional space, wherein each voxel corresponds to a volume segment of the three-dimensional space; determine, for each of one or more of the respective voxels, a plurality of point-wise concatenated features from the respective points included in the respective voxel, wherein to determine the point-wise concatenated features for a given one of the respective voxels, the program instructions are configured to: identify a plurality of point-wise determined features; determine a locally aggregated feature from the identified plurality of point-wise determined features via element-wise max-pooling across the plurality of point-wise determined features; and augment, based on the locally aggregated feature, the respective ones of the point-wise features for the points to form respective point-wise concatenated features; determine, for each of one or more of the respective voxels, a voxel feature, wherein the voxel feature is determined based on the plurality of point-wise concatenated features determined from the respective points included in the voxel; and provide a four-dimensional (4D) tensor representation of the point cloud comprising the determined voxel features. 2. The system of claim 1 , further comprising: one or more Lidar sensors configured to capture the plurality of points that make up the point cloud, wherein the point cloud is received by the voxel feature learning network as a raw point cloud captured by the one or more Lidar sensors. 3. The system of claim 1 , wherein the one or more processors and the memory, or one or more additional processors and an additional memory, include program instructions configured to implement: one or more convolutional middle layers configured to process the 4D tensor into a high-dimensional volumetric representation of the point cloud; and a region proposal network configured to generate a three dimensional (3D) object detection output determined based at least in part on the high-dimensional volumetric representation of the point cloud. 4. The system of claim 3 , wherein the voxel feature learning network comprises a first fully-connected neural network, and wherein the region proposal network comprises an additional fully-connected neural network. 5. The system of claim 4 , wherein the first fully-connected neural network is trained to identify voxel features based on training data, wherein the training data comprises ground truth 3D bounding boxes corresponding to objects included in a training point cloud, and wherein errors between the ground truth 3D bounding boxes and 3D bounding boxes identified by the region proposal network are used to train the first fully connected neural network and the additional fully connected neural network. 6. The system of claim 5 , wherein the program instructions are configured to cause the system to augment the training data, wherein to augment the training data, the program instructions cause the one or more processors to: apply a perturbation to each ground truth 3D bounding box and independently apply a perturbation to the points of the training point cloud that are included in the respective ground truth 3D bounding boxes; apply a global scaling to each ground truth 3D bounding box; or apply a global rotation to each ground truth 3D bounding box and to the training point cloud, wherein the global rotation simulates a vehicle making a turn. 7. The system of claim 1 , wherein, for each of the one or more voxels, to determine the voxel feature based on the point-wise concatenated features identified from the respective points included in the voxel, the voxel feature learning network is configured to: transform the plurality of concatenated point-wise features, via a fully-connected neural network, into the voxel feature, wherein the fully connected neural network comprises a linear layer, a batch normalization layer, and/or a rectified linear unit (ReLU) layer. 8. The system of claim 7 , wherein element-wise max-pooling is applied to transform the plurality of concatenated point-wise features into the voxel feature. 9. The system of claim 1 , wherein to group the respective sets of the points of the point cloud into the respective voxels, the voxel feature learning network is configured to: for a voxel for which a set of respective points in three-dimensional space corresponding to the voxel is less than a threshold number of points, assign all of the respective points in three-dimensional space corresponding to the voxel to the voxel; and for another voxel for which another set of respective points in three-dimensional space corresponding to the other voxel is greater than the threshold number of points, randomly sample the respective points in three-dimensional space corresponding to the other voxel for inclusion in the other voxel, such that the number of points included in the other voxel is less than or equal to the threshold number of points. 10. The system of claim 1 , wherein the voxel feature learning network is implemented via a plurality of processors, and wherein the plurality of processors are configured to determine, in parallel, respective voxel features for a plurality of respective voxels. 11. A computer implemented method, comprising: performing by one or more computers: subdividing a three-dimensional (3D) space of a point cloud into equally spaced voxels, wherein the point cloud includes information regarding one or more points within a 3D coordinate system, and wherein each of the one or more points resides in one of the voxels; grouping the points of the point cloud according the particular respective voxels in which they reside; determining, for each of one or more of the respective voxels, a plurality of point-wise concatenated features based on the selected points residing in the respective voxel, wherein determining the point-wise concatenated features for a given voxel of the respective voxels comprises: identifying a plurality of point-wise determined features; determining a locally aggregated feature from the identified plurality of point-wise determined features via element-wise max-pooling across the plurality of point-wise determined features; and augmenting, based on the locally aggregated feature, the respective ones of the point-wise features for the points to form respective point-wise concatenated features; determining voxel features based on the point-wise concatenated features determined for the respective one or more voxels; and representing the determined voxel features as a sparse 4D tensor. 12. The computer implemented method of claim 11 , wherein said determining the plurality of point-wise concatenated features further comprises: computing a local mean as a centroid of points within the given voxel; augmenting each point residing within the given voxel with an offset relative to the computed centroid; transforming each augmented point into a feature space; encoding, based on the augmented points transformed into the feature space, t

Assignees

Apple Inc

Inventors

Classifications

G06V10/7747
Organisation of the process, e.g. bagging or boosting · CPC title
G06V20/58
Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads · CPC title
G06T7/73Primary
using feature-based methods · CPC title
G06V10/82
using neural networks · CPC title
G06V10/50
by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis · CPC title

Patent family

Related publications grouped by family.

View patent family 75275539

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10970518B1 cover?: A voxel feature learning network receives a raw point cloud and converts the point cloud into a sparse 4D tensor comprising three-dimensional coordinates (e.g. X, Y, and Z) for each voxel of a plurality of voxels and a fourth voxel feature dimension for each non-empty voxel. In some embodiments, convolutional mid layers further transform the 4D tensor into a high-dimensional volumetric represen…
Who is the assignee on this patent?: Apple Inc
What technology area does this patent fall under?: Primary CPC classification G06T7/73. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 06 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).