Gaming state object tracking
US-2024420539-A1 · Dec 19, 2024 · US
US2025384581A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025384581-A1 |
| Application number | US-202519241359-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jun 17, 2025 |
| Priority date | Jun 17, 2024 |
| Publication date | Dec 18, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural radiance field (NeRF) model on unposed images. In particular, the training incorporates a geometric consistency loss to train the encoder neural network that predicts the poses of the unposed images.
Opening claim text (preview).
1 . A method performed by one or more computers, the method comprising: obtaining a plurality of images of a scene in an environment; and training, using the plurality of images, (i) an encoder neural network configured to receive an input image and generate, as output, a pose estimate that estimates a camera pose of a camera that captured the input image and (ii) a neural radiance field (NeRF) model that receives as input the pose estimate generated by the encoder neural network and generates a reconstruction of the input image, the training comprising, at each of a plurality of training iterations: for each of one or more pairs of images of the scene: for each image in the pair, processing the image using the encoder neural network to generate a respective pose estimate for the image; and for each image in the pair, applying an equivalence relation to the respective pose estimate for the image to generate an equivalence class of a plurality of pose estimates for the image; and training the encoder neural network on a consistency loss function that measures, for each of the one or more pairs of images, geometric consistency between the equivalence classes of pose estimates for the images in the pair. 2 . The method of claim 1 , wherein the correspondence loss measures, for each of the pairs, deviations from an epipolar geometry between the images in the pair that are defined by the equivalence classes of pose estimates for the images in the pair. 3 . The method of claim 1 , wherein, for each of the pairs, the reconstruction loss function measures a deviation for a pair of pose estimates that results in a minimum deviation of any combination of pose estimates from the equivalence classes for the images of the pair. 4 . The method of claim 3 , wherein, for each combination of pose estimates from the equivalence classes, the deviation is a disparity between projected keypoints in the image pair computed according to the combination of pose estimates. 5 . The method of claim 4 , wherein the disparity is a symmetric epipolar distance. 6 . The method of claim 4 , wherein correspondences between projected keypoints are based on Scale-Invariant Feature Transform (SIFT) features of the images in the image pair. 7 . The method of claim 6 , wherein the SIFT features are RootSIFT features. 8 . The method of claim 6 , wherein pairs of images that do not have at least a threshold number of corresponding keypoints are not included in the one or more pairs of images. 9 . The method of claim 1 , further comprising maintaining a queue of image pairs and wherein the one or more image pairs are the one or more image pairs in the queue having the smallest geometric consistency losses. 10 . The method of claim 9 , wherein the image pairs in the queue are randomly selected from possible pairs that each include two of the images of the scene. 11 . The method of claim 1 , further comprising, at each of the plurality of training iterations: for each image in a set of one or more of the plurality of images: processing the image using the encoder neural network to generate a pose estimate for the image; applying the equivalence relation to the pose estimate to generate an equivalence class of a plurality of pose estimates; and for each of the plurality of pose estimates, processing the pose estimate using the NeRF model to generate a respective reconstruction of one or more pixels from the image; and training the encoder neural network and the NeRF model on a reconstruction loss function that measures, for each of the plurality of pose estimates for each of the one or more images, an error between the one or more pixels of the image and the respective reconstruction of the one or more pixels of the image generated from the pose estimate. 12 . The method of claim 1 , further comprising: after the training, receiving data specifying a new camera pose; and processing the data specifying the new camera pose using the trained NeRF model to generate a new image of the scene that appears to be taken by a camera having the new camera pose. 13 . The method of claim 1 , further comprising: after the training, receiving a new image of the scene; and processing the new image of the scene using the trained encoder neural network to generate an estimate of a camera pose of a camera that captured the new image. 14 . The method of claim 1 , wherein the equivalence relation is based on properties of the scene. 15 . The method of claim 14 , wherein the equivalence relation is based on respective symmetries of one or more objects in the scene. 16 . The method of claim 1 , wherein the equivalence relation specifies that the equivalence class includes each equivalent pose estimate, and wherein an equivalent pose estimate is any pose estimate for which, for any integer k, the equivalent pose estimate is equal to a sum of the pose estimate and 2kπ/N. 17 . The method of claim 16 , wherein the value of N is received as input and defines a number of distinct elements of the equivalence class. 18 . The method of claim 1 , wherein the pose estimate comprises an estimated azimuth of the camera. 19 . The method of claim 18 , wherein the equivalence relation induces a replication of cameras along the azimuthal dimension. 20 . The method of claim 1 , wherein the pose estimate comprises an estimated elevation of the camera. 21 . The method of claim 1 , wherein the pose estimate comprises an estimate camera roll of the camera. 22 . The method of claim 1 , wherein the pose estimate comprises an estimated location of an origin in a camera reference frame of the camera. 23 . The method of claim 1 , wherein the encoder neural network is a convolutional neural network. 24 . The method of claim 11 , wherein the reconstruction loss function measures, for each of the one or more images, a minimum of the errors for each of the plurality of pose estimates for the image. 25 . The method of claim 11 , wherein the error between the image and the respective reconstruction of the image generated from the pose estimate is a squared L2 error between the image and the respective reconstruction. 26 . A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising: obtaining a plurality of images of a scene in an environment; and training, using the plurality of images, (i) an encoder neural network configured to receive an input image and generate, as output, a pose estimate that estimates a camera pose of a camera that captured the input image and (ii) a neural radiance field (NeRF) model that receives as input the pose estimate generated by the encoder neural network and generates a reconstruction of the input image, the training comprising, at each of a plurality of training iterations: for each of one or more pairs of images of the scene: for each image in the pair, processing the image using the encoder neural network to generate a respective pose estimate for the image; and for each image in the pair, applying an equivalence relation to the respective pose estimate for the image to generate an equivalence class of a plurality of pose estimates for the image; and training the encoder neural network on a consistency loss function that measures, for each of the one or more
Artificial neural networks [ANN] · CPC title
Training; Learning · CPC title
involving reference images or patches · CPC title
Two-dimensional [2D] image generation · CPC title
Camera pose · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.