What technology area does this patent fall under?

Primary CPC classification G06T7/74. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 11 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

System and method for training a neural network for visual localization based upon learning objects-of-interest dense match regression

US11003956B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11003956-B2
Application number	US-201916414125-A
Country	US
Kind code	B2
Filing date	May 16, 2019
Priority date	May 16, 2019
Publication date	May 11, 2021
Grant date	May 11, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for training, using a plurality of training images with corresponding six degrees of freedom camera pose for a given environment and a plurality of reference images, each reference image depicting an object-of-interest in the given environment and having a corresponding two-dimensional to three-dimensional correspondence for the given environment, a neural network to provide visual localization by: for each training image, detecting and segmenting object-of-interest in the training image; generating a set of two-dimensional to two-dimensional matches between the detected and segmented objects-of-interest and corresponding reference images; generating a set of two-dimensional to three-dimensional matches from the generated set of two-dimensional to two-dimensional matches and the two-dimensional to three-dimensional correspondences corresponding to the reference images; and determining localization, for each training image, by solving a perspective-n-point problem using the generated set of two-dimensional to three-dimensional matches.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, using a data processor, for training, using a plurality of training images with corresponding six degrees of freedom camera pose for a predetermined environment and a plurality of reference images, each reference image depicting an object-of-interest in the predetermined environment and having a corresponding two-dimensional to three-dimensional correspondence for the predetermined environment, a neural network to provide visual localization of a camera pose in the predetermined environment, comprising: (a) for each training image, detecting and segmenting object-of-interest in the training image; (b) generating a set of two-dimensional to two-dimensional matches between the detected and segmented objects-of-interest and corresponding reference images; (c) generating a set of two-dimensional to three-dimensional matches from the generated set of two-dimensional to two-dimensional matches and the two-dimensional to three-dimensional correspondences corresponding to the reference images; and (d) determining localization of the camera pose in the predetermined environment, for each training image, by solving a perspective-n-point problem using the generated set of two-dimensional to three-dimensional matches. 2. The method as claimed in claim 1 , wherein the set of two-dimensional to two-dimensional matches between the detected and segmented objects-of-interest and corresponding reference images is generated by a regressing matches between the detected and segmented objects-of-interest and corresponding reference images. 3. The method as claimed in claim 1 , wherein localization is determined by solving a perspective-n-point problem using random sample consensus and the generated set of two-dimensional to three-dimensional matches. 4. The method as claimed in claim 1 , wherein the training images are artificially generated with homography data augmentation. 5. The method as claimed in claim 1 , wherein the training images are artificially generated with color data augmentation to train the neural network with respect to lighting changes. 6. The method as claimed in claim 1 , further comprising: (e) when an object-of-interest in the predetermined environment is moved, updating, using structure-from-motion reconstruction, the corresponding two-dimensional to three-dimensional correspondence for the given environment without retraining the neural network. 7. The method as claimed in claim 1 , wherein the objects-of-interest are planar objects-of-interest. 8. The method as claimed in claim 1 , wherein the set of two-dimensional to three-dimensional matches is generated by transitivity. 9. The method as claimed in claim 1 , wherein the training images do not contain occlusions. 10. The method as claimed in claim 1 , wherein the neural network is a convolutional neural network. 11. The method as claimed in claim 1 , wherein the generated set of two-dimensional to two-dimensional matches is dense. 12. A method, using a trained neural network having a plurality of reference images, each reference image depicting an object-of-interest in a predetermined environment and having a corresponding two-dimensional to three-dimensional correspondence for the predetermined environment, for determining, from a query image generated from a camera pose, localization of the camera pose in the predetermined environment, comprising: (a) detecting and segmenting an object-of-interest in the query image using the trained neural network; (b) generating a set of two-dimensional to two-dimensional matches between the detected and segmented object-of-interest and a corresponding reference image using the trained neural network; (c) generating a set of two-dimensional to three-dimensional matches from the generated set of two-dimensional to two-dimensional matches and the two-dimensional to three-dimensional correspondences corresponding to the reference image; and (d) determining localization of the camera pose in the predetermined environment, for the query image, by solving a perspective-n-point problem using the generated set of two-dimensional to three-dimensional matches. 13. The method as claimed in claim 12 , wherein the generated set of two-dimensional to two-dimensional matches is dense. 14. The method as claimed in claim 13 , wherein the dense set of two-dimensional to two-dimensional matches between the detected and segmented object-of-interest and corresponding reference image is generated by regressing dense matches between the detected and segmented object-of-interest and corresponding reference image. 15. The method as claimed in claim 12 , wherein localization of the camera pose in the predetermined environment is determined by solving a perspective-n-point problem using random sample consensus and the generated set of two-dimensional to three-dimensional matches. 16. The method as claimed in claim 12 , wherein the object-of-interest is a planar object-of-interest. 17. The method as claimed in claim 12 , wherein the set of two-dimensional to three-dimensional matches is generated by transitivity. 18. The method as claimed in claim 12 , wherein the trained neural network is a trained convolutional neural network. 19. A computer-implemented method for camera pose localization, comprising: (a) receiving a query image generated from a camera pose; (b) accessing a neural network trained using a plurality of reference images, each reference image depicting an object-of-interest in a predetermined environment and having a corresponding two-dimensional to three-dimensional correspondence for the predetermined environment; (c) using the trained neural network for (c1) detecting and segmenting an object-of-interest in the query image, and (c2) generating a set of two-dimensional to two-dimensional matches between the detected and segmented object-of-interest and a corresponding reference image; (d) generating a set of two-dimensional to three-dimensional matches from the generated set of two-dimensional to two-dimensional matches and the two-dimensional to three-dimensional correspondences corresponding to the reference image; (e) determining localization of the camera pose in the predetermined environment, for the query image, by solving a perspective-n-point problem using the generated set of two-dimensional to three-dimensional matches; and (f) outputting the localization of the camera pose in the predetermined environment. 20. The method as claimed in claim 19 , wherein the generated set of two-dimensional to two-dimensional matches is dense. 21. The method as claimed in claim 19 , wherein the trained neural network is a trained convolutional neural network. 22. The method as claimed in claim 19 , wherein the trained neural network is used for detecting and segmenting a plurality of objects-of-interest in the query image, and for generating a plurality of sets of two-dimensional to two-dimensional matches between each of the plurality of the detected and segmented objects-of-interest and a corresponding reference image. 23. A method, using a data processor, for training a neural network for determining, from a query image generated from a camera pose, localization of the camera pose in a predetermined environment, comprising: (a) accessing a first frame of training data with two-dimensional pixels having an object-of-interest identified by a bounding box of a manual mask and a class label; (b) using structure-from-motion reconstruction to

Assignees

Naver Corp

Inventors

Classifications

G06V10/774
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
G06T7/74Primary
involving reference images or patches · CPC title
G06V10/82
using neural networks · CPC title
G06V10/776
Validation; Performance evaluation · CPC title
G06V10/764
using classification, e.g. of video objects · CPC title

Patent family

Related publications grouped by family.

View patent family 73228674

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11003956B2 cover?: A method for training, using a plurality of training images with corresponding six degrees of freedom camera pose for a given environment and a plurality of reference images, each reference image depicting an object-of-interest in the given environment and having a corresponding two-dimensional to three-dimensional correspondence for the given environment, a neural network to provide visual loc…
Who is the assignee on this patent?: Naver Corp
What technology area does this patent fall under?: Primary CPC classification G06T7/74. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 11 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).