What technology area does this patent fall under?

Primary CPC classification G06T7/20. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 30 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Learning-based camera pose estimation from images of an environment

US10964061B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10964061-B2
Application number	US-202016872752-A
Country	US
Kind code	B2
Filing date	May 12, 2020
Priority date	Oct 6, 2017
Publication date	Mar 30, 2021
Grant date	Mar 30, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A deep neural network (DNN) system learns a map representation for estimating a camera position and orientation (pose). The DNN is trained to learn a map representation corresponding to the environment, defining positions and attributes of structures, trees, walls, vehicles, etc. The DNN system learns a map representation that is versatile and performs well for many different environments (indoor, outdoor, natural, synthetic, etc.). The DNN system receives images of an environment captured by a camera (observations) and outputs an estimated camera pose within the environment. The estimated camera pose is used to perform camera localization, i.e., recover the three-dimensional (3D) position and orientation of a moving camera, which is a fundamental task in computer vision with a wide variety of applications in robot navigation, car localization for autonomous driving, device localization for mobile navigation, and augmented/virtual reality.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: receiving image pairs at a deep neural network, DNN, wherein a relative camera pose is associated with each image pair of the image pairs; applying, by the DNN, weights that define a map representation of an environment to the image pairs to generate estimated camera pose pairs, wherein each estimated camera pose pair generated by the DNN for each image pair is the estimated camera pose pair for capturing the environment to produce the image pair; computing first differences, wherein each first difference of the first differences is computed between a camera pose for at least one image of each image pair and the estimated camera pose pair generated by the DNN for the image pair; computing, for each image pair, a relative estimated camera pose based on the estimated camera pose pairs; computing second differences, wherein each second difference of the second differences is computed between the relative camera pose associated with each image pair and the relative estimated camera pose computed for the image pair; and updating the weights based on the first differences and the second differences. 2. The computer-implemented method of claim 1 , further comprising, for each image pair, computing a second camera pose for a remaining image of the image pair using the relative camera pose and the camera pose for the at least one image, wherein the camera pose for the at least one image and the second camera pose comprise a pair of camera poses. 3. The computer-implemented method of claim 1 , wherein each one of the image pairs includes a first image and an additional image in an image sequence, and one or more intervening images may occur between the first image and the additional image. 4. The computer-implemented method of claim 1 , wherein the weights are updated to simultaneously reduce the first differences and the second differences. 5. The computer-implemented method of claim 1 , wherein a rotation portion of the estimated camera pose pairs is parameterized as a three-dimensional logarithm of a unit quaternion. 6. The computer-implemented method of claim 1 , further comprising receiving visual odometry data corresponding to the image pairs, wherein the weights are updated to minimize differences between the visual odometry data and the relative estimated camera pose. 7. The computer-implemented method of claim 1 , further comprising receiving global position sensor data corresponding to the image pairs, wherein the weights are updated to minimize differences between the global position sensor data and the estimated camera pose pairs. 8. The computer-implemented method of claim 1 , further comprising receiving inertial measurement data corresponding to the image pairs, wherein the weights are updated to minimize differences between the inertial measurement data and the estimated camera pose pairs. 9. The computer-implemented method of claim 1 , further comprising post-processing the estimated camera pose pairs using pose graph optimization, PGO, to produce refined camera pose pairs. 10. The computer-implemented method of claim 1 , wherein the DNN comprises at least a convolutional neural network layer, followed by a global average pooling layer, followed by a fully-connected layer to output the estimated camera pose pairs. 11. A system, comprising: a deep neural network, DNN, configured to: receive image pairs, wherein a relative camera pose is associated with each one of the image pairs; apply weights that define a map representation of an environment to each one of the image pairs to generate estimated camera pose pairs, wherein each estimated camera pose pair generated by the DNN for each image pair is the estimated camera pose pair for capturing the environment to produce the image pair; compute first differences, wherein each first difference of the first differences is computed between a camera pose for at least one image of each image pair and the estimated camera pose pair generated by the DNN for the image pair; compute, for each image pair, a relative estimated camera pose based on the estimated camera pose pairs; compute second differences, wherein each second difference of the second differences is computed between the relative camera pose associated with each image pair and the relative estimated camera pose computed for the image pair; and update the weights based on the first differences and the second differences. 12. The system of claim 11 , wherein, for each image pair, a second camera pose is computed for a remaining image of the image pair using the relative camera pose and the camera pose for the at least one image. 13. The system of claim 11 , wherein each one of the image pairs includes a first image and an additional image in an image sequence, and one or more intervening images may occur between the first image and the additional image. 14. The system of claim 11 , wherein the weights are updated to simultaneously reduce the first differences and the second differences. 15. The system of claim 11 , wherein a rotation portion of the estimated camera pose pairs is parameterized as a three-dimensional logarithm of a unit quaternion. 16. The system of claim 11 , wherein the system is further configured to receive visual odometry data corresponding to the image pairs, wherein the weights are updated to minimize differences between the visual odometry data and the relative camera pose. 17. The system of claim 11 , wherein the system is further configured to post-process the estimated camera pose pairs using pose graph optimization, PGO, to produce a refined camera pose pairs. 18. A non-transitory computer-readable media storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of: receiving image pairs at a deep neural network, DNN, wherein a relative camera pose is associated with each image pair of the image pairs; applying, by the DNN, weights that define a map representation of an environment to the image pairs to generate estimated camera pose pairs, wherein each estimated camera pose pair generated by the DNN for each image pair is the estimated camera pose pair for capturing the environment to produce the image pair; computing first differences, wherein each first difference of the first differences is computed between a camera pose for at least one image of each image pair and the estimated camera pose pair generated by the DNN for the image pair; computing, for each image pair, a relative estimated camera pose based on the estimated camera pose pairs; computing second differences, wherein each second difference of the second differences is computed between the relative camera pose associated with each image pair and the relative estimated camera pose computed for the image pair; and updating the weights based on the first differences and the second differences.

Assignees

Nvidia Corp

Inventors

Classifications

G06T7/20Primary
Analysis of motion (motion estimation for coding, decoding, compressing or decompressing digital video signals H04N19/43, H04N19/51) · CPC title
G06T7/80Primary
Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration · CPC title
G06N5/01
Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title
G06N3/045
Combinations of networks · CPC title
G06V10/955
using specific electronic processors · CPC title

Patent family

Related publications grouped by family.

View patent family 65993974

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10964061B2 cover?: A deep neural network (DNN) system learns a map representation for estimating a camera position and orientation (pose). The DNN is trained to learn a map representation corresponding to the environment, defining positions and attributes of structures, trees, walls, vehicles, etc. The DNN system learns a map representation that is versatile and performs well for many different environments (indo…
Who is the assignee on this patent?: Nvidia Corp
What technology area does this patent fall under?: Primary CPC classification G06T7/20. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 30 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).