Three-Dimensional Object Detection
US-2020025935-A1 · Jan 23, 2020 · US
US10936908B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-10936908-B1 |
| Application number | US-202016869093-A |
| Country | US |
| Kind code | B1 |
| Filing date | May 7, 2020 |
| Priority date | Jul 21, 2017 |
| Publication date | Mar 2, 2021 |
| Grant date | Mar 2, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for semantic labeling of point clouds using images. Some implementations may include obtaining a point cloud that is based on lidar data reflecting one or more objects in a space; obtaining an image that includes a view of at least one of the one or more objects in the space; determining a projection of points from the point cloud onto the image; generating, using the projection, an augmented image that includes one or more channels of data from the point cloud and one or more channels of data from the image; inputting the augmented image to a two dimensional convolutional neural network to obtain a semantic labeled image wherein elements of the semantic labeled image include respective predictions; and mapping, by reversing the projection, predictions of the semantic labeled image to respective points of the point cloud to obtain a semantic labeled point cloud.
Opening claim text (preview).
What is claimed is: 1. A system, comprising: a data processing apparatus; and a data storage device storing instructions executable by the data processing apparatus that upon execution by the data processing apparatus cause the data processing apparatus to perform operations comprising: obtaining a point cloud in three spatial dimensions; obtaining an image in two spatial dimensions; determining a projection of points from the point cloud onto the image; generating, using the projection, an augmented image that includes one or more channels of data from the point cloud and one or more channels of data from the image; inputting the augmented image to a neural network to obtain a semantic labeled image, wherein elements of the semantic labeled image include respective predictions; and mapping, by reversing the projection, predictions of the semantic labeled image to respective points of the point cloud to obtain a semantic labeled point cloud. 2. The system of claim 1 , wherein the image is a first image and the semantic labeled image is a first semantic labeled image, and wherein the operations comprise: obtaining a second image in two spatial dimensions; determining a second semantic labeled image based on the second image augmented with data from the point cloud; mapping predictions of the second semantic labeled image to respective points of the point cloud; and accumulating predictions from the first semantic labeled image and from the second semantic labeled image for at least one point of the semantic labeled point cloud. 3. The system of claim 1 , wherein the operations comprise: searching a set of images associated with different respective camera locations to identify a subset of images that includes at least two images with views of each point in the point cloud; and wherein the image is obtained from the subset of images. 4. The system of claim 1 , wherein the operations comprise: obtaining a training point cloud that includes points labeled with ground truth labels; obtaining a training image, in two spatial dimensions, that includes a view of at least one object that is reflected in the training point cloud; determining a projection of points from the training point cloud onto the training image; generating, using the projection, an augmented training image that includes one or more channels of data from the training point cloud and one or more channels of data from the training image; and training the neural network using the augmented training image and corresponding ground truth labels for projected points from the training point cloud. 5. The system of claim 1 , wherein the point cloud is determined using a bundle adjustment process based on lidar scans captured at a plurality of locations and times, and wherein the operations comprise: assigning indications of moving likelihood to respective points of the point cloud based on how frequently the respective points are detected in lidar scans captured at different times; applying a fully connected conditional random field to the indications of moving likelihood for points in the point cloud to obtain moving labels for respective points of the point cloud, wherein the moving labels are binary indications of whether or not a respective point of the point cloud corresponds to a moving object; and wherein the moving labels are included in the augmented image as one of the one or more channels of data from the point cloud. 6. The system of claim 1 , wherein the operations comprise: determining a graph based on the semantic labeled point cloud, wherein nodes of the graph are points from the semantic labeled point cloud and edges of the graph connect nodes with respective points that satisfy a pairwise criteria; identifying one or more connected components of the graph; and determining clusters of points from the semantic labeled point cloud by performing a hierarchical segmentation of each of the one or more connected components of the graph. 7. The system of claim 6 , wherein the operations comprise: inputting predictions based on predictions for points of one of the clusters to a three dimensional convolutional neural network to obtain a prediction for the cluster; and assigning the prediction for the cluster to all points of the cluster in the semantic labeled point cloud. 8. A method comprising: obtaining a point cloud in three spatial dimensions; obtaining an image in two spatial dimensions; determining a projection of points from the point cloud onto the image; generating, using the projection, an augmented image that includes one or more channels of data from the point cloud and one or more channels of data from the image; inputting the augmented image to a neural network to obtain a semantic labeled image, wherein elements of the semantic labeled image include respective predictions; and mapping, by reversing the projection, predictions of the semantic labeled image to respective points of the point cloud to obtain a semantic labeled point cloud. 9. The method of claim 8 , wherein the image is a first image and the semantic labeled image is a first semantic labeled image, and further comprising: obtaining a second image in two spatial dimensions; determining a second semantic labeled image based on the second image augmented with data from the point cloud; mapping predictions of the second semantic labeled image to respective points of the point cloud; and accumulating predictions from the first semantic labeled image and from the second semantic labeled image for at least one point of the semantic labeled point cloud. 10. The method of claim 8 , comprising: searching a set of images associated with different respective camera locations to identify a subset of images that includes at least two images with views of each point in the point cloud; and wherein the image is obtained from the subset of images. 11. The method of claim 8 , comprising: obtaining a training point cloud that includes points labeled with ground truth labels; obtaining a training image, in two spatial dimensions, that includes a view of at least one object that is reflected in the training point cloud; determining a projection of points from the training point cloud onto the training image; generating, using the projection, an augmented training image that includes one or more channels of data from the training point cloud and one or more channels of data from the training image; and training the neural network using the augmented training image and corresponding ground truth labels for projected points from the training point cloud. 12. The method of claim 8 , wherein the point cloud is determined using a bundle adjustment process based on lidar scans captured at a plurality of locations and times, and comprising: assigning indications of moving likelihood to respective points of the point cloud based on how frequently the respective points are detected in lidar scans captured at different times; apply a fully connected conditional random field to the indications of moving likelihood for points in the point cloud to obtain moving labels for respective points of the point cloud, wherein the moving labels are binary indications of whether or not a respective point of the point cloud corresponds to a moving object; and wherein the moving labels are included in the augmented image as one of the one or more channels of data from the point cloud. 13. The method of claim 8 , wherein the one or more channels of data from the point cloud that are included in the augmented image include at least one channel from amongst the set of depth, normal, height, li
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion · CPC title
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Region-based segmentation · CPC title
Three-dimensional [3D] objects · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.