Large-scale outdoor augmented reality scenes using camera pose based on learned descriptors

US11568642B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11568642-B2
Application numberUS-202017068429-A
CountryUS
Kind codeB2
Filing dateOct 12, 2020
Priority dateOct 12, 2020
Publication dateJan 31, 2023
Grant dateJan 31, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and systems are provided for facilitating large-scale augmented reality in relation to outdoor scenes using estimated camera pose information. In particular, camera pose information for an image can be estimated by matching the image to a rendered ground-truth terrain model with known camera pose information. To match images with such renders, data driven cross-domain feature embedding can be learned using a neural network. Cross-domain feature descriptors can be used for efficient and accurate feature matching between the image and the terrain model renders. This feature matching allows images to be localized in relation to the terrain model, which has known camera pose information. This known camera pose information can then be used to estimate camera pose information in relation to the image.

First claim

Opening claim text (preview).

What is claimed is: 1. One or more non-transitory computer-readable media having a plurality of executable instructions embodied thereon, which, when executed by one or more processors, cause the one or more processors to perform operations comprising: receiving an image, wherein the image has corresponding location information; receiving a set of renders based on a terrain model, the set of renders related to the location information of the image; determining keypoints from the image and each of the set of renders; generating, with a trained model, a first set of descriptors including a descriptor for each keypoint from the image and a second set of descriptors including a descriptor for each keypoint from the set of renders, the trained model trained, using a cross-domain embedding function, to align an input image to at least a portion of a terrain model; identifying candidate renders from the set of renders based on comparing the first set of descriptors related to the image with the second set of descriptors related to the set of renders; and estimating camera pose related to the image using known camera pose information related to the candidate renders. 2. The media of claim 1 , the operations further comprising: generating the set of renders, each of the set of renders comprising a rendered image with a filed-of-view of sixty degrees rotated by thirty degrees around a vertical axis. 3. The media of claim 1 , the operations further comprising: extracting image patches based on the determined keypoints from the image; extracting render patches based on the determined keypoints from the set of renders; and inputting the image patches and render patches into a trained model related to the cross-domain embedding function to generate the descriptors for each keypoint. 4. The media of claim 1 , the operations further comprising: generating an augmented image using the camera pose related to the image, wherein the augmented image comprises the image and an overlay of augmentation information. 5. The media of claim 4 , wherein the augmentation information comprises at least one of contour lines, gravel roads, and trails. 6. The media of claim 4 , further comprising: outputting the augmentation image for display via a device. 7. The media of claim 1 , the operations further comprising: training a model related to the related to the cross-domain embedding function. 8. The media of claim 7 , further comprising: generating training data to train the model, wherein the training data comprises aligned pairs of training images and training renders based on the terrain model. 9. A computer-implemented method, the method comprising: receiving one or more image patches related to an image, wherein the image has corresponding location information; receiving one or more render patches related to a set of renders based on a terrain model, the set of renders related to the location information of the image; generating, with a trained model, a first set of descriptors including a descriptor for each of the one or more image patches and a second set of descriptors including a descriptor for each of the one or more render patches, the trained model trained, using a cross-domain embedding function, to align an input image to at least a portion of a terrain model; identifying candidate renders from the set of renders based on comparing the first set of descriptors related to the image with the second set of descriptors related to the set of renders; and estimating camera pose related to the image using known camera pose information related to the candidate renders. 10. The computer-implemented method of claim 9 , further comprising: generating the set of renders, each of the set of renders comprising a rendered image with a filed-of-view of sixty degrees rotated by thirty degrees around a vertical axis. 11. The computer-implemented method of claim 9 , further comprising: determining the image patches based on image keypoints in the image; and determining the render patches based on render keypoints in the set of renders. 12. The computer-implemented method of claim 11 , further comprising: generating an augmented image using the camera pose related to the image, wherein the augmented image comprises the image and an overlay of augmentation information; and outputting the augmentation image for display via a mobile device. 13. The computer-implemented method of claim 12 , wherein the augmentation information comprises at least one of contour lines, gravel roads, and trails. 14. The computer-implemented method of claim 9 , wherein estimating the camera pose further comprises: matching two-dimensional points of the candidate renders in relation to a three-dimensional model using rendered camera parameters and a depth map related to the candidate renders; and determining the camera pose for the image with respect to the three-dimensional model. 15. The computer-implemented method of claim 9 , further comprising: training a model related to the cross-domain embedding function. 16. The computer-implemented method of claim 15 , wherein the model is trained using a neural network corrected using cross-domain triplet loss. 17. The computer-implemented method of claim 15 , further comprising: generating training data to train the model, wherein the training data comprises aligned pairs of training images and training renders based on the terrain model. 18. A computing system comprising: generating, using a trained model, one or more image descriptors related to an image, the trained model trained, using a cross-domain embedding function, to align an input image to at least a portion of a terrain model, and wherein the image is of an outdoor scene in a location; matching the one or more image descriptors related to the image with one or more render descriptors related to a set of renders generated based on the terrain model, the set of renders related to the location of the image; and estimating a camera pose for the image based on the one or more render descriptors matched to the one or more image descriptors. 19. The system of claim 18 , further comprising: generating training data for training a model related to the cross-domain embedding function; and training the model related to the cross-domain embedding function. 20. The system of claim 18 , further comprising: generating an augmented image using the camera pose related to the image, wherein the augmented image comprises the image and an overlay of augmentation information; and displaying the augmentation image.

Assignees

Inventors

Classifications

  • Determining position or orientation of objects or cameras (camera calibration G06T7/80) · CPC title

  • Machine learning · CPC title

  • G06V20/20Primary

    in augmented reality scenes · CPC title

  • Supervised learning · CPC title

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11568642B2 cover?
Methods and systems are provided for facilitating large-scale augmented reality in relation to outdoor scenes using estimated camera pose information. In particular, camera pose information for an image can be estimated by matching the image to a rendered ground-truth terrain model with known camera pose information. To match images with such renders, data driven cross-domain feature embedding …
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06V20/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 31 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).