Pose estimation and model retrieval for objects in images
US-2019147221-A1 · May 16, 2019 · US
US11568642B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11568642-B2 |
| Application number | US-202017068429-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 12, 2020 |
| Priority date | Oct 12, 2020 |
| Publication date | Jan 31, 2023 |
| Grant date | Jan 31, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and systems are provided for facilitating large-scale augmented reality in relation to outdoor scenes using estimated camera pose information. In particular, camera pose information for an image can be estimated by matching the image to a rendered ground-truth terrain model with known camera pose information. To match images with such renders, data driven cross-domain feature embedding can be learned using a neural network. Cross-domain feature descriptors can be used for efficient and accurate feature matching between the image and the terrain model renders. This feature matching allows images to be localized in relation to the terrain model, which has known camera pose information. This known camera pose information can then be used to estimate camera pose information in relation to the image.
Opening claim text (preview).
What is claimed is: 1. One or more non-transitory computer-readable media having a plurality of executable instructions embodied thereon, which, when executed by one or more processors, cause the one or more processors to perform operations comprising: receiving an image, wherein the image has corresponding location information; receiving a set of renders based on a terrain model, the set of renders related to the location information of the image; determining keypoints from the image and each of the set of renders; generating, with a trained model, a first set of descriptors including a descriptor for each keypoint from the image and a second set of descriptors including a descriptor for each keypoint from the set of renders, the trained model trained, using a cross-domain embedding function, to align an input image to at least a portion of a terrain model; identifying candidate renders from the set of renders based on comparing the first set of descriptors related to the image with the second set of descriptors related to the set of renders; and estimating camera pose related to the image using known camera pose information related to the candidate renders. 2. The media of claim 1 , the operations further comprising: generating the set of renders, each of the set of renders comprising a rendered image with a filed-of-view of sixty degrees rotated by thirty degrees around a vertical axis. 3. The media of claim 1 , the operations further comprising: extracting image patches based on the determined keypoints from the image; extracting render patches based on the determined keypoints from the set of renders; and inputting the image patches and render patches into a trained model related to the cross-domain embedding function to generate the descriptors for each keypoint. 4. The media of claim 1 , the operations further comprising: generating an augmented image using the camera pose related to the image, wherein the augmented image comprises the image and an overlay of augmentation information. 5. The media of claim 4 , wherein the augmentation information comprises at least one of contour lines, gravel roads, and trails. 6. The media of claim 4 , further comprising: outputting the augmentation image for display via a device. 7. The media of claim 1 , the operations further comprising: training a model related to the related to the cross-domain embedding function. 8. The media of claim 7 , further comprising: generating training data to train the model, wherein the training data comprises aligned pairs of training images and training renders based on the terrain model. 9. A computer-implemented method, the method comprising: receiving one or more image patches related to an image, wherein the image has corresponding location information; receiving one or more render patches related to a set of renders based on a terrain model, the set of renders related to the location information of the image; generating, with a trained model, a first set of descriptors including a descriptor for each of the one or more image patches and a second set of descriptors including a descriptor for each of the one or more render patches, the trained model trained, using a cross-domain embedding function, to align an input image to at least a portion of a terrain model; identifying candidate renders from the set of renders based on comparing the first set of descriptors related to the image with the second set of descriptors related to the set of renders; and estimating camera pose related to the image using known camera pose information related to the candidate renders. 10. The computer-implemented method of claim 9 , further comprising: generating the set of renders, each of the set of renders comprising a rendered image with a filed-of-view of sixty degrees rotated by thirty degrees around a vertical axis. 11. The computer-implemented method of claim 9 , further comprising: determining the image patches based on image keypoints in the image; and determining the render patches based on render keypoints in the set of renders. 12. The computer-implemented method of claim 11 , further comprising: generating an augmented image using the camera pose related to the image, wherein the augmented image comprises the image and an overlay of augmentation information; and outputting the augmentation image for display via a mobile device. 13. The computer-implemented method of claim 12 , wherein the augmentation information comprises at least one of contour lines, gravel roads, and trails. 14. The computer-implemented method of claim 9 , wherein estimating the camera pose further comprises: matching two-dimensional points of the candidate renders in relation to a three-dimensional model using rendered camera parameters and a depth map related to the candidate renders; and determining the camera pose for the image with respect to the three-dimensional model. 15. The computer-implemented method of claim 9 , further comprising: training a model related to the cross-domain embedding function. 16. The computer-implemented method of claim 15 , wherein the model is trained using a neural network corrected using cross-domain triplet loss. 17. The computer-implemented method of claim 15 , further comprising: generating training data to train the model, wherein the training data comprises aligned pairs of training images and training renders based on the terrain model. 18. A computing system comprising: generating, using a trained model, one or more image descriptors related to an image, the trained model trained, using a cross-domain embedding function, to align an input image to at least a portion of a terrain model, and wherein the image is of an outdoor scene in a location; matching the one or more image descriptors related to the image with one or more render descriptors related to a set of renders generated based on the terrain model, the set of renders related to the location of the image; and estimating a camera pose for the image based on the one or more render descriptors matched to the one or more image descriptors. 19. The system of claim 18 , further comprising: generating training data for training a model related to the cross-domain embedding function; and training the model related to the cross-domain embedding function. 20. The system of claim 18 , further comprising: generating an augmented image using the camera pose related to the image, wherein the augmented image comprises the image and an overlay of augmentation information; and displaying the augmentation image.
Determining position or orientation of objects or cameras (camera calibration G06T7/80) · CPC title
Machine learning · CPC title
in augmented reality scenes · CPC title
Supervised learning · CPC title
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.