Generating an avatar from real time image data
US-9508197-B2 · Nov 29, 2016 · US
US12002146B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12002146-B2 |
| Application number | US-202217656778-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 28, 2022 |
| Priority date | Mar 28, 2022 |
| Publication date | Jun 4, 2024 |
| Grant date | Jun 4, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and systems are disclosed for performing operations for generating a 3D model of a scene. The operations include: receiving a set of two-dimensional (2D) images representing a first view of a real-world environment; applying a machine learning model comprising a neural light field network to the set of 2D images to predict pixel values of a target image representing a second view of the real-world environment, the machine learning model being trained to map a ray origin and direction directly to a given pixel value; and generating a three-dimensional (3D) model of the real-world environment based on the set of 2D images and the predicted target image.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving, by one or more processors, a set of two-dimensional (2D) images representing a first view of a real-world environment; applying, by the one or more processors, a machine learning model comprising a neural light field network to the set of 2D images to predict pixel values of a target image representing a second view of the real-world environment, the machine learning model being trained to map a ray origin and direction directly to a given pixel value, the pixel values of the target image being predicted without processing multiple points along a camera ray directed to the given pixel value; and generating, by the one or more processors, a three-dimensional (3D) model of the real-world environment based on the set of 2D images and the predicted target image. 2. The method of claim 1 , wherein the machine learning model comprises a deep residual Multi-Layer Perceptron (MLP) network. 3. The method of claim 1 , further comprising: selecting a ray origin and direction associated with the second view of the real-world environment; using a given 2D image of the set of 2D images representing the first view of the real-world environment, generating a ray based on the ray origin and direction corresponding to the second view of the real-world environment; and sampling a plurality of points along the ray. 4. The method of claim 3 , wherein the plurality of points is spaced evenly along the ray. 5. The method of claim 3 , wherein the plurality of points is randomly sampled based on stratified sampling during training of the machine learning model. 6. The method of claim 3 , further comprising: concatenating the plurality of points to generate input data; and processing the input data with the machine learning model to predict one of the pixel values of the 2D target image. 7. The method of claim 1 , further comprising training the machine learning model by performing operations comprising: receiving training data comprising training images and associated camera poses, the training images and associated camera poses being associated with a plurality of training ray origins and normalized ray directions and corresponding ground-truth pixel values; obtaining the training ray origin and normalized ray direction of a first training image of the training images and associated camera pose; applying the machine learning model to a set of points along a training ray formed by the training ray origin and normalized ray direction to predict a training pixel value; retrieving the ground-truth pixel value associated with the first training image; computing a deviation between the predicted training pixel value and the ground-truth pixel value; and updating one or more parameters of the machine learning model based on the deviation. 8. The method of claim 1 , wherein the machine learning model comprises a first machine learning model, further comprising applying a second machine learning model to a collection of 2D images to generate training data used to train the machine learning model. 9. The method of claim 8 , wherein the second machine learning model comprises a trained neural radiance field network. 10. The method of claim 9 , wherein the neural radiance field network generates an output representing radiance of a sampled point corresponding to a particular ray, wherein a pixel value of the particular ray is generated through alpha-composition of a plurality of points including the sampled point along the particular ray corresponding to respective radiance values. 11. The method of claim 10 , further comprising: receiving a first 2D image of the collection of 2D images; based on the first 2D image, selecting a ray origin and normalized direction of a camera pose associated with a second 2D image associated with a different camera pose than the first 2D image; applying the second machine learning model to the ray origin and normalized direction of the camera pose associated with the second 2D image to predict a pixel value for the second 2D image, wherein applying the second machine learning model comprises: applying the second machine learning model to each of a plurality of points along a ray formed by the ray origin and normalized direction of the camera pose associated with the second 2D image to generate a plurality of radiance values; and performing alpha-composition of the plurality of radiance values to predict the pixel value for the second 2D image, wherein the ray origin and normalized direction of the camera pose is randomly selected based on a uniform distribution. 12. The method of claim 8 , wherein the training data excludes the collection of 2D images. 13. The method of claim 7 , further comprising: generating a plurality of losses for the training data; sorting the plurality of losses in ascending order; and identifying a quantity of samples in the sorted plurality of losses that transgress a specified threshold. 14. The method of claim 13 , further comprising augmenting the training data with additional training data corresponding to the quantity of samples. 15. The method of claim 1 , further comprising displaying a virtual element associated with an augmented reality or virtual reality experience on a client device within a video comprising the set of 2D images based on the 3D model of the real-world environment. 16. A system comprising: at least one processor of a device; and a memory component having instructions stored thereon that, when executed by the at least one processor, cause the at least one processor to perform operations comprising: receiving a set of two-dimensional (2D) images representing a first view of a real-world environment; applying a machine learning model comprising a neural light field network to the set of 2D images to predict pixel values of a target image representing a second view of the real-world environment, the machine learning model being trained to map a ray origin and direction directly to a given pixel value, the pixel values of the target image being predicted without processing multiple points along a camera ray directed to the given pixel value; and generating a three-dimensional (3D) model of the real-world environment based on the set of 2D images and the predicted target image. 17. The system of claim 16 , wherein the pixel values of the target image are predicted without by the machine learning model without determining radiance values of points along one or more rays. 18. The system of claim 17 , the pixel values of the target image being predicted without processing the set of 2D images with a Neural Radiance Field (NeRF) network. 19. The system of claim 18 , wherein the NeRF network is used to generate training data for the machine learning model comprising pseudo-generated training images representing different camera poses of an individual real-world environment. 20. A non-transitory computer-readable storage medium having stored thereon instructions that, when executed by at least one processor of a device, cause the at least one processor to perform operations comprising: receiving a set of two-dimensional (2D) images representing a first view of a real-world environment; applying a machine learning model comprising a neural light field network to the set of 2D images to predict pixel values of a target image representing a second view of the real-world environment, the machine learning model being trained to map a ray origin and direction directly to a given pixel value, the p
Artificial neural networks [ANN] · CPC title
Training; Learning · CPC title
Ray-tracing · CPC title
Determining parameters from multiple pictures (depth or shape recovery from multiple images G06T7/55; stereo camera calibration G06T7/85) · CPC title
Three-dimensional [3D] modelling for computer graphics · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.