Generating synthetic images and/or training machine learning model(s) based on the synthetic images
US-2021327127-A1 · Oct 21, 2021 · US
US11967015B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11967015-B2 |
| Application number | US-202117145232-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 8, 2021 |
| Priority date | Feb 6, 2020 |
| Publication date | Apr 23, 2024 |
| Grant date | Apr 23, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The subject technology provides a framework for learning neural scene representations directly from images, without three-dimensional (3D) supervision, by a machine-learning model. In the disclosed systems and methods, 3D structure can be imposed by ensuring that the learned representation transforms like a real 3D scene. For example, a loss function can be provided which enforces equivariance of the scene representation with respect to 3D rotations. Because naive tensor rotations may not be used to define models that are equivariant with respect to 3D rotations, a new operation called an invertible shear rotation is disclosed, which has the desired equivariance property. In some implementations, the model can be used to generate a 3D representation, such as mesh, of an object from an image of the object.
Opening claim text (preview).
What is claimed is: 1. A method comprising: providing an input image depicting a view of an object to a machine learning model, wherein the machine learning model utilizes a nearest neighbor shear rotation and has been trained based on a constraint of equivariance under rotations between a training object and a model-generated representation of the training object, the constraint comprising a comparison of a first implicit representation of the training object to a rotated version of a second implicit representation of the training object and a comparison of the second implicit representation to a rotated version of the first implicit representation; and generating, using the machine learning model and based on the provided input image, at least one of an output image that depicts the object from a rotated view that is different from the view of the object in the input image, or a three-dimensional representation of the object. 2. The method of claim 1 , wherein the machine learning model utilizes: inverse rendering; and forward rendering. 3. The method of claim 2 , wherein generating the at least one of the output image that depicts the object from the rotated view that is different from the view of the object in the input image or the three-dimensional representation of the object comprises generating the at least one of the output image that depicts the object from the rotated view that is different from the view of the object in the input image or the three-dimensional representation of the object with the forward rendering. 4. The method of claim 3 , further comprising generating an implicit representation of the object with the inverse rendering based on the input image. 5. The method of claim 4 , wherein the forward rendering generates the at least one of the output image that depicts the object from the rotated view that is different from the view of the object in the input image or the three-dimensional representation of the object based on the implicit representation generated by the inverse rendering. 6. The method of claim 5 , wherein generating the at least one of the output image that depicts the object from the rotated view that is different from the view of the object in the input image or the three-dimensional representation of the object based on the implicit representation comprises rotating the implicit representation of the object. 7. The method of claim 6 , wherein rotating the implicit representation of the object comprises performing the nearest neighbor shear rotation of the implicit representation of the object. 8. The method of claim 7 , wherein the three-dimensional representation comprises is an explicit three-dimensional representation including at least one of a voxel grid, a mesh or a point cloud. 9. The method of claim 7 , wherein the implicit representation of the object comprises a tensor or a latent space of an autoencoder. 10. The method of claim 4 , wherein generating the implicit representation of the object with the inverse rendering based on the input image comprises generating the implicit representation in a single forward pass of the inverse rendering. 11. The method of claim 1 , further comprising training the machine learning model based on the constraint of equivariance under rotations between the training object and the model-generated representation of the training object by: providing a first input training image depicting a first view of the training object to the machine learning model; providing a second input training image depicting a second view of the training object to the machine learning model; generating the first implicit representation of the training object based on the first input training image; generating the second implicit representation of the training object based on the second input training image; rotating the first implicit representation of the training object to generate the rotated version of the first implicit representation; rotating the second implicit representation of the training object to generate the rotated version of the second implicit representation; generating a first output training image based on the rotated version of the first implicit representation of the training object; generating a second output training image based on the rotated version of the second implicit representation of the training object; comparing the first input training image to the second output training image; and comparing the second input training image to the first output training image. 12. The method of claim 11 , wherein the training further comprises minimizing a loss function based on the comparing of the first input training image to the second output training image and the comparing of the second input training image to the first output training image. 13. The method of claim 12 , further comprising: comparing the first implicit representation to the rotated version of the second implicit representation; and comparing the second implicit representation to the rotated version of the first implicit representation. 14. The method of claim 13 , wherein the loss function is further based on the comparing of the first implicit representation to the rotated version of the second implicit representation and the comparing of the second implicit representation to the rotated version of first implicit representation. 15. The method of claim 1 , further comprising training the machine learning model based on at least two input training images without three-dimensional supervision of the training. 16. The method of claim 15 , further comprising testing the trained machine learning model without providing pose information to the trained machine learning model. 17. A system comprising: a processor; a memory device containing instructions, which when executed by the processor cause the processor to: provide an input image depicting a view of an object to a machine learning model, wherein the machine learning model utilizes a nearest neighbor shear rotation and has been trained based on a constraint of equivariance under rotations between a training object and a model-generated representation of the training object, wherein the nearest neighbor shear rotation comprises an invertible shear rotation of an implicit three-dimensional representation of the object in which each voxel of the implicit three-dimensional representation of the object is shifted to a unique nearest neighbor on a grid; and generate, using the machine learning model and based on the provided input image, at least one of an output image that depicts the object from a rotated view that is different from the view of the object in the input image, or a three-dimensional representation of the object. 18. The system of claim 17 , wherein a model architecture of the machine learning model, including a shear rotation module, is fully differentiable. 19. A non-transitory machine-readable medium comprising code that, when executed by a processor, causes the processor to: provide an input image depicting a view of an object to a machine learning model, wherein the machine learning model utilizes shear rotation and has been trained based on at least two input training images depicting different views of a training object and a constraint of equivariance under rotations, the constraint comprising a comparison of a first implicit representation of the training object to a rotated version of a second implicit representation of the training object and a comparison of the second implicit representation to a rot
Convolutional networks [CNN, ConvNet] · CPC title
Generative networks · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
Image-based rendering · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.