Figure-Ground Neural Radiance Fields For Three-Dimensional Object Category Modelling
US-2023130281-A1 · Apr 27, 2023 · US
US12450822B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12450822-B2 |
| Application number | US-202418644653-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 24, 2024 |
| Priority date | Mar 28, 2022 |
| Publication date | Oct 21, 2025 |
| Grant date | Oct 21, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and systems are disclosed for performing operations for generating a 3D model of a scene. The operations include: receiving a set of two-dimensional (2D) images representing a first view of a real-world environment; applying a machine learning model comprising a neural light field network to the set of 2D images to predict pixel values of a target image representing a second view of the real-world environment, the machine learning model being trained to map a ray origin and direction directly to a given pixel value; and generating a three-dimensional (3D) model of the real-world environment based on the set of 2D images and the predicted target image.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving, by one or more processors, a set of images representing a first view of a scene; applying, by the one or more processors, a machine learning model comprising a neural light field network to the set of images to predict pixel values of a target image, the pixel values of the target image being predicted without processing multiple points along a camera ray directed to a given pixel value; and generating, by the one or more processors, a model of the scene based on the set of images and the target image. 2. The method of claim 1 , wherein the machine learning model comprises a deep residual Multi-Layer Perceptron (MLP) network, the target image representing a second view of the scene, and the machine learning model being trained to map a ray origin and direction directly to the given pixel value. 3. The method of claim 1 , the set of images comprising 2D images and the model comprising a 3D model, further comprising: selecting a ray origin and direction associated with a second view of the scene; using a given 2D image of the set of 2D images representing the first view of the scene, generating a ray based on the ray origin and direction corresponding to the second view of the scene; and sampling a plurality of points along the ray. 4. The method of claim 3 , wherein the plurality of points is spaced evenly along the ray. 5. The method of claim 3 , wherein the plurality of points is randomly sampled based on stratified sampling during training of the machine learning model. 6. The method of claim 3 , further comprising: concatenating the plurality of points to generate input data; and processing the input data with the machine learning model to predict one of the pixel values of the target image. 7. The method of claim 1 , further comprising training the machine learning model by performing operations comprising: receiving training data comprising training images and associated camera poses, the training images and associated camera poses being associated with a plurality of training ray origins and normalized ray directions and corresponding ground-truth pixel values; obtaining a training ray origin and normalized ray direction of a first training image of the training images and associated camera pose; applying the machine learning model to a set of points along a training ray formed by the training ray origin and normalized ray direction to predict a training pixel value; retrieving a ground-truth pixel value associated with the first training image; computing a deviation between the predicted training pixel value and the ground-truth pixel value; and updating one or more parameters of the machine learning model based on the deviation. 8. The method of claim 1 , wherein the machine learning model comprises a first machine learning model, further comprising applying a second machine learning model to a collection of 2D images to generate training data used to train the machine learning model. 9. The method of claim 8 , wherein the second machine learning model comprises a trained neural radiance field network. 10. The method of claim 9 , wherein the trained neural radiance field network generates an output representing radiance of a sampled point corresponding to a particular ray, wherein a pixel value of the particular ray is generated through alpha-composition of a plurality of points including the sampled point along the particular ray corresponding to respective radiance values. 11. The method of claim 10 , further comprising: receiving a first 2D image of the collection of 2D images; based on the first 2D image, selecting a ray origin and normalized direction of a camera pose associated with a second 2D image associated with a different camera pose than the first 2D image; applying the second machine learning model to the ray origin and normalized direction of the camera pose associated with the second 2D image to predict a pixel value for the second 2D image, wherein applying the second machine learning model comprises: applying the second machine learning model to each of a plurality of points along a ray formed by the ray origin and normalized direction of the camera pose associated with the second 2D image to generate a plurality of radiance values; and performing alpha-composition of the plurality of radiance values to predict the pixel value for the second 2D image, wherein the ray origin and normalized direction of the camera pose is randomly selected based on a uniform distribution. 12. The method of claim 8 , wherein the training data excludes the collection of 2D images. 13. The method of claim 7 , further comprising: generating a plurality of losses for the training data; sorting the plurality of losses in ascending order; and identifying a quantity of samples in the sorted plurality of losses that transgress a specified threshold. 14. The method of claim 13 , further comprising augmenting the training data with additional training data corresponding to the quantity of samples. 15. The method of claim 1 , further comprising displaying a virtual element associated with an augmented reality or virtual reality experience on a user device within a video comprising the set of images based on the model of the scene. 16. A system comprising: at least one processor; and a memory component having instructions stored thereon that, when executed by the at least one processor, cause the at least one processor to perform operations comprising: receiving a set of images representing a first view of a scene; applying a machine learning model comprising a neural light field network to the set of images to predict pixel values of a target image, the pixel values of the target image being predicted without processing multiple points along a camera ray directed to a given pixel value; and generating a model of the scene based on the set of images and the target image. 17. The system of claim 16 , wherein the pixel values of the target image are predicted without by the machine learning model without determining radiance values of points along one or more rays. 18. The system of claim 17 , the pixel values of the target image being predicted without processing the set of images with a Neural Radiance Field (NeRF) network. 19. The system of claim 18 , wherein the NeRF network is used to generate training data for the machine learning model comprising pseudo-generated training images representing different camera poses of an individual scene. 20. A non-transitory computer-readable storage medium having stored thereon instructions that, when executed by at least one processor, cause the at least one processor to perform operations comprising: receiving a set of images representing a first view of a scene; applying a machine learning model comprising a neural light field network to the set of images to predict pixel values of a target image, the pixel values of the target image being predicted without processing multiple points along a camera ray directed to a given pixel value; and generating a model of the scene based on the set of images and the target image.
Artificial neural networks [ANN] · CPC title
Training; Learning · CPC title
Determining parameters from multiple pictures (depth or shape recovery from multiple images G06T7/55; stereo camera calibration G06T7/85) · CPC title
Ray-tracing · CPC title
Three-dimensional [3D] modelling for computer graphics · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.