Neural radiance fields with unposed images using geometric consistency

US2025384581A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025384581-A1
Application numberUS-202519241359-A
CountryUS
Kind codeA1
Filing dateJun 17, 2025
Priority dateJun 17, 2024
Publication dateDec 18, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural radiance field (NeRF) model on unposed images. In particular, the training incorporates a geometric consistency loss to train the encoder neural network that predicts the poses of the unposed images.

First claim

Opening claim text (preview).

1 . A method performed by one or more computers, the method comprising: obtaining a plurality of images of a scene in an environment; and training, using the plurality of images, (i) an encoder neural network configured to receive an input image and generate, as output, a pose estimate that estimates a camera pose of a camera that captured the input image and (ii) a neural radiance field (NeRF) model that receives as input the pose estimate generated by the encoder neural network and generates a reconstruction of the input image, the training comprising, at each of a plurality of training iterations: for each of one or more pairs of images of the scene: for each image in the pair, processing the image using the encoder neural network to generate a respective pose estimate for the image; and for each image in the pair, applying an equivalence relation to the respective pose estimate for the image to generate an equivalence class of a plurality of pose estimates for the image; and training the encoder neural network on a consistency loss function that measures, for each of the one or more pairs of images, geometric consistency between the equivalence classes of pose estimates for the images in the pair. 2 . The method of claim 1 , wherein the correspondence loss measures, for each of the pairs, deviations from an epipolar geometry between the images in the pair that are defined by the equivalence classes of pose estimates for the images in the pair. 3 . The method of claim 1 , wherein, for each of the pairs, the reconstruction loss function measures a deviation for a pair of pose estimates that results in a minimum deviation of any combination of pose estimates from the equivalence classes for the images of the pair. 4 . The method of claim 3 , wherein, for each combination of pose estimates from the equivalence classes, the deviation is a disparity between projected keypoints in the image pair computed according to the combination of pose estimates. 5 . The method of claim 4 , wherein the disparity is a symmetric epipolar distance. 6 . The method of claim 4 , wherein correspondences between projected keypoints are based on Scale-Invariant Feature Transform (SIFT) features of the images in the image pair. 7 . The method of claim 6 , wherein the SIFT features are RootSIFT features. 8 . The method of claim 6 , wherein pairs of images that do not have at least a threshold number of corresponding keypoints are not included in the one or more pairs of images. 9 . The method of claim 1 , further comprising maintaining a queue of image pairs and wherein the one or more image pairs are the one or more image pairs in the queue having the smallest geometric consistency losses. 10 . The method of claim 9 , wherein the image pairs in the queue are randomly selected from possible pairs that each include two of the images of the scene. 11 . The method of claim 1 , further comprising, at each of the plurality of training iterations: for each image in a set of one or more of the plurality of images: processing the image using the encoder neural network to generate a pose estimate for the image; applying the equivalence relation to the pose estimate to generate an equivalence class of a plurality of pose estimates; and for each of the plurality of pose estimates, processing the pose estimate using the NeRF model to generate a respective reconstruction of one or more pixels from the image; and training the encoder neural network and the NeRF model on a reconstruction loss function that measures, for each of the plurality of pose estimates for each of the one or more images, an error between the one or more pixels of the image and the respective reconstruction of the one or more pixels of the image generated from the pose estimate. 12 . The method of claim 1 , further comprising: after the training, receiving data specifying a new camera pose; and processing the data specifying the new camera pose using the trained NeRF model to generate a new image of the scene that appears to be taken by a camera having the new camera pose. 13 . The method of claim 1 , further comprising: after the training, receiving a new image of the scene; and processing the new image of the scene using the trained encoder neural network to generate an estimate of a camera pose of a camera that captured the new image. 14 . The method of claim 1 , wherein the equivalence relation is based on properties of the scene. 15 . The method of claim 14 , wherein the equivalence relation is based on respective symmetries of one or more objects in the scene. 16 . The method of claim 1 , wherein the equivalence relation specifies that the equivalence class includes each equivalent pose estimate, and wherein an equivalent pose estimate is any pose estimate for which, for any integer k, the equivalent pose estimate is equal to a sum of the pose estimate and 2kπ/N. 17 . The method of claim 16 , wherein the value of N is received as input and defines a number of distinct elements of the equivalence class. 18 . The method of claim 1 , wherein the pose estimate comprises an estimated azimuth of the camera. 19 . The method of claim 18 , wherein the equivalence relation induces a replication of cameras along the azimuthal dimension. 20 . The method of claim 1 , wherein the pose estimate comprises an estimated elevation of the camera. 21 . The method of claim 1 , wherein the pose estimate comprises an estimate camera roll of the camera. 22 . The method of claim 1 , wherein the pose estimate comprises an estimated location of an origin in a camera reference frame of the camera. 23 . The method of claim 1 , wherein the encoder neural network is a convolutional neural network. 24 . The method of claim 11 , wherein the reconstruction loss function measures, for each of the one or more images, a minimum of the errors for each of the plurality of pose estimates for the image. 25 . The method of claim 11 , wherein the error between the image and the respective reconstruction of the image generated from the pose estimate is a squared L2 error between the image and the respective reconstruction. 26 . A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising: obtaining a plurality of images of a scene in an environment; and training, using the plurality of images, (i) an encoder neural network configured to receive an input image and generate, as output, a pose estimate that estimates a camera pose of a camera that captured the input image and (ii) a neural radiance field (NeRF) model that receives as input the pose estimate generated by the encoder neural network and generates a reconstruction of the input image, the training comprising, at each of a plurality of training iterations: for each of one or more pairs of images of the scene: for each image in the pair, processing the image using the encoder neural network to generate a respective pose estimate for the image; and for each image in the pair, applying an equivalence relation to the respective pose estimate for the image to generate an equivalence class of a plurality of pose estimates for the image; and training the encoder neural network on a consistency loss function that measures, for each of the one or more

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025384581A1 cover?
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural radiance field (NeRF) model on unposed images. In particular, the training incorporates a geometric consistency loss to train the encoder neural network that predicts the poses of the unposed images.
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06T7/74. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 18 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).