Who is the assignee on this patent?

Baidu Usa Llc, Baidu Com Times Tech Beijing Co Ltd

What technology area does this patent fall under?

Primary CPC classification G06T7/11. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 08 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for deep localization and segmentation with a 3D semantic map

US11030525B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11030525-B2
Application number	US-201816604548-A
Country	US
Kind code	B2
Filing date	Feb 9, 2018
Priority date	Feb 9, 2018
Publication date	Jun 8, 2021
Grant date	Jun 8, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Presented are deep learning-based systems and methods for fusing sensor data, such as camera images, motion sensors (GPS/IMU), and a 3D semantic map to achieve robustness, real-time performance, and accuracy of camera localization and scene parsing useful for applications such as robotic navigation and augment reality. In embodiments, a unified framework accomplishes this by jointly using camera poses and scene semantics in training and testing. To evaluate the presented methods and systems, embodiments use a novel dataset that is created from real scenes and comprises dense 3D semantically labeled point clouds, ground truth camera poses obtained from high-accuracy motion sensors, and pixel-level semantic labels of video camera images. As demonstrated by experimental results, the presented systems and methods are mutually beneficial for both camera poses and scene semantics.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for using a network to perform joint scene parsing and camera pose estimation, the method comprising: receiving semantic map data, image data associated with a camera, and sensor data that comprises a coarse camera pose; creating a first semantic label map by using the coarse camera pose and a camera intrinsic parameter; providing both the image data and the first semantic label map to a first pose network to obtain a corrected camera pose; and inputting the image data into a segment network to generate a two-dimensional parsing associated with the inputted image data. 2. The method of claim 1 , wherein the sensor data is provided by a motion sensor. 3. The method of claim 1 , further comprising using the corrected camera pose in a second pose network to generate a refined camera pose to increase a pose accuracy. 4. The method of claim 3 , wherein the first pose network and the segment network are convolutional neural networks, and the second pose network is a recurrent neural network. 5. The method of claim 3 , further comprising, based on the refined camera pose, rendering a second semantic label map that is input to the segment network. 6. The method of claim 5 , wherein the second semantic label map is embedded into the segment network as a segmentation context. 7. The method of claim 5 , further comprising transforming the second semantic label map to a score map through a one-hot operation. 8. The method of claim 1 , wherein the two-dimensional parsing comprises a per-pixel semantic label. 9. The method of claim 1 , wherein the first pose network calculates a relative rotation and translation, and wherein the corrected camera pose is used to generate temporal correlations. 10. A system for joint scene parsing and camera pose estimation, the system comprising: a camera that has an intrinsic parameter and generates image data; a sensor that generates sensor data comprising a coarse camera pose; a processor comprising instructions that when executed create a first semantic label map based on semantic map data, the image data, and the sensor data; a first pose network that in response to receiving the image data and the first semantic label map generates a corrected camera pose; and a segment network that, based on the image data, generates a two-dimensional parsing that is associated with the image data. 11. The system of claim 10 , wherein the sensor data is provided by a motion sensor. 12. The system of claim 10 , wherein the sensor data comprises a location estimate. 13. The system of claim 10 , further comprising a second pose network that, based on the corrected camera pose, generates a refined camera pose to increase a pose accuracy by rendering a second semantic label map that is input to the segment network. 14. The system of claim 13 , wherein the second semantic label map is two-dimensional and embedded into the segment network as a segmentation context. 15. The system of claim 10 , wherein the semantic map data comprises a point of a three-dimensional point cloud, the point being enlarged to a two-dimensional square whose size is determined by a semantic class associated with the three-dimensional point cloud. 16. A method for training a network to perform joint scene parsing and camera pose estimation, the method comprising: receiving semantic map data, image data associated with a camera, and sensor data that comprises a coarse camera pose; creating a semantic label map by using the coarse camera pose and a camera intrinsic parameter; providing both the image data and the semantic label map to a first pose network to obtain a corrected camera pose; inputting the image data into a segment network to generate a two-dimensional parsing associated with the inputted image; and using a loss that comprises a weight factor that depends on a semantic class. 17. The method of claim 16 , wherein the first pose network and the segment network are convolutional neural networks, further comprising a second pose network that is a recurrent neural network. 18. The method of claim 16 , wherein the semantic map data comprises a point of a three-dimensional point cloud, the point being enlarged to a two-dimensional square whose size is determined by a semantic class associated with the three-dimensional point cloud. 19. The method of claim 18 , wherein the size of the two-dimensional square is proportional to an average distance between the camera and the semantic class. 20. The method of claim 16 , further comprising removing data associated with moving objects from the semantic map data by at least one of repeatedly scanning a road segment, aligning and fusing point clouds in the semantic map data, and removing, from point clouds in the semantic map data, points that have a relatively lower temporal consistency.

Assignees

Inventors

Classifications

G06V20/20
in augmented reality scenes · CPC title
G06V20/56
exterior to a vehicle by using sensors mounted on the vehicle · CPC title
G06V10/82
using neural networks · CPC title
G06T7/11Primary
Region-based segmentation · CPC title
G06N3/08Primary
Learning methods · CPC title

Patent family

Related publications grouped by family.

View patent family 67548171

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11030525B2 cover?: Presented are deep learning-based systems and methods for fusing sensor data, such as camera images, motion sensors (GPS/IMU), and a 3D semantic map to achieve robustness, real-time performance, and accuracy of camera localization and scene parsing useful for applications such as robotic navigation and augment reality. In embodiments, a unified framework accomplishes this by jointly using camer…
Who is the assignee on this patent?: Baidu Usa Llc, Baidu Com Times Tech Beijing Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06T7/11. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 08 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).