Systems and methods for deep localization and segmentation with a 3D semantic map

US11030525B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11030525-B2
Application numberUS-201816604548-A
CountryUS
Kind codeB2
Filing dateFeb 9, 2018
Priority dateFeb 9, 2018
Publication dateJun 8, 2021
Grant dateJun 8, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Presented are deep learning-based systems and methods for fusing sensor data, such as camera images, motion sensors (GPS/IMU), and a 3D semantic map to achieve robustness, real-time performance, and accuracy of camera localization and scene parsing useful for applications such as robotic navigation and augment reality. In embodiments, a unified framework accomplishes this by jointly using camera poses and scene semantics in training and testing. To evaluate the presented methods and systems, embodiments use a novel dataset that is created from real scenes and comprises dense 3D semantically labeled point clouds, ground truth camera poses obtained from high-accuracy motion sensors, and pixel-level semantic labels of video camera images. As demonstrated by experimental results, the presented systems and methods are mutually beneficial for both camera poses and scene semantics.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for using a network to perform joint scene parsing and camera pose estimation, the method comprising: receiving semantic map data, image data associated with a camera, and sensor data that comprises a coarse camera pose; creating a first semantic label map by using the coarse camera pose and a camera intrinsic parameter; providing both the image data and the first semantic label map to a first pose network to obtain a corrected camera pose; and inputting the image data into a segment network to generate a two-dimensional parsing associated with the inputted image data. 2. The method of claim 1 , wherein the sensor data is provided by a motion sensor. 3. The method of claim 1 , further comprising using the corrected camera pose in a second pose network to generate a refined camera pose to increase a pose accuracy. 4. The method of claim 3 , wherein the first pose network and the segment network are convolutional neural networks, and the second pose network is a recurrent neural network. 5. The method of claim 3 , further comprising, based on the refined camera pose, rendering a second semantic label map that is input to the segment network. 6. The method of claim 5 , wherein the second semantic label map is embedded into the segment network as a segmentation context. 7. The method of claim 5 , further comprising transforming the second semantic label map to a score map through a one-hot operation. 8. The method of claim 1 , wherein the two-dimensional parsing comprises a per-pixel semantic label. 9. The method of claim 1 , wherein the first pose network calculates a relative rotation and translation, and wherein the corrected camera pose is used to generate temporal correlations. 10. A system for joint scene parsing and camera pose estimation, the system comprising: a camera that has an intrinsic parameter and generates image data; a sensor that generates sensor data comprising a coarse camera pose; a processor comprising instructions that when executed create a first semantic label map based on semantic map data, the image data, and the sensor data; a first pose network that in response to receiving the image data and the first semantic label map generates a corrected camera pose; and a segment network that, based on the image data, generates a two-dimensional parsing that is associated with the image data. 11. The system of claim 10 , wherein the sensor data is provided by a motion sensor. 12. The system of claim 10 , wherein the sensor data comprises a location estimate. 13. The system of claim 10 , further comprising a second pose network that, based on the corrected camera pose, generates a refined camera pose to increase a pose accuracy by rendering a second semantic label map that is input to the segment network. 14. The system of claim 13 , wherein the second semantic label map is two-dimensional and embedded into the segment network as a segmentation context. 15. The system of claim 10 , wherein the semantic map data comprises a point of a three-dimensional point cloud, the point being enlarged to a two-dimensional square whose size is determined by a semantic class associated with the three-dimensional point cloud. 16. A method for training a network to perform joint scene parsing and camera pose estimation, the method comprising: receiving semantic map data, image data associated with a camera, and sensor data that comprises a coarse camera pose; creating a semantic label map by using the coarse camera pose and a camera intrinsic parameter; providing both the image data and the semantic label map to a first pose network to obtain a corrected camera pose; inputting the image data into a segment network to generate a two-dimensional parsing associated with the inputted image; and using a loss that comprises a weight factor that depends on a semantic class. 17. The method of claim 16 , wherein the first pose network and the segment network are convolutional neural networks, further comprising a second pose network that is a recurrent neural network. 18. The method of claim 16 , wherein the semantic map data comprises a point of a three-dimensional point cloud, the point being enlarged to a two-dimensional square whose size is determined by a semantic class associated with the three-dimensional point cloud. 19. The method of claim 18 , wherein the size of the two-dimensional square is proportional to an average distance between the camera and the semantic class. 20. The method of claim 16 , further comprising removing data associated with moving objects from the semantic map data by at least one of repeatedly scanning a road segment, aligning and fusing point clouds in the semantic map data, and removing, from point clouds in the semantic map data, points that have a relatively lower temporal consistency.

Assignees

Inventors

Classifications

  • in augmented reality scenes · CPC title

  • exterior to a vehicle by using sensors mounted on the vehicle · CPC title

  • using neural networks · CPC title

  • G06T7/11Primary

    Region-based segmentation · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11030525B2 cover?
Presented are deep learning-based systems and methods for fusing sensor data, such as camera images, motion sensors (GPS/IMU), and a 3D semantic map to achieve robustness, real-time performance, and accuracy of camera localization and scene parsing useful for applications such as robotic navigation and augment reality. In embodiments, a unified framework accomplishes this by jointly using camer…
Who is the assignee on this patent?
Baidu Usa Llc, Baidu Com Times Tech Beijing Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06T7/11. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 08 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).