What technology area does this patent fall under?

Primary CPC classification G06T15/205. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 23 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Neural rendering

US11967015B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11967015-B2
Application number	US-202117145232-A
Country	US
Kind code	B2
Filing date	Jan 8, 2021
Priority date	Feb 6, 2020
Publication date	Apr 23, 2024
Grant date	Apr 23, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The subject technology provides a framework for learning neural scene representations directly from images, without three-dimensional (3D) supervision, by a machine-learning model. In the disclosed systems and methods, 3D structure can be imposed by ensuring that the learned representation transforms like a real 3D scene. For example, a loss function can be provided which enforces equivariance of the scene representation with respect to 3D rotations. Because naive tensor rotations may not be used to define models that are equivariant with respect to 3D rotations, a new operation called an invertible shear rotation is disclosed, which has the desired equivariance property. In some implementations, the model can be used to generate a 3D representation, such as mesh, of an object from an image of the object.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: providing an input image depicting a view of an object to a machine learning model, wherein the machine learning model utilizes a nearest neighbor shear rotation and has been trained based on a constraint of equivariance under rotations between a training object and a model-generated representation of the training object, the constraint comprising a comparison of a first implicit representation of the training object to a rotated version of a second implicit representation of the training object and a comparison of the second implicit representation to a rotated version of the first implicit representation; and generating, using the machine learning model and based on the provided input image, at least one of an output image that depicts the object from a rotated view that is different from the view of the object in the input image, or a three-dimensional representation of the object. 2. The method of claim 1 , wherein the machine learning model utilizes: inverse rendering; and forward rendering. 3. The method of claim 2 , wherein generating the at least one of the output image that depicts the object from the rotated view that is different from the view of the object in the input image or the three-dimensional representation of the object comprises generating the at least one of the output image that depicts the object from the rotated view that is different from the view of the object in the input image or the three-dimensional representation of the object with the forward rendering. 4. The method of claim 3 , further comprising generating an implicit representation of the object with the inverse rendering based on the input image. 5. The method of claim 4 , wherein the forward rendering generates the at least one of the output image that depicts the object from the rotated view that is different from the view of the object in the input image or the three-dimensional representation of the object based on the implicit representation generated by the inverse rendering. 6. The method of claim 5 , wherein generating the at least one of the output image that depicts the object from the rotated view that is different from the view of the object in the input image or the three-dimensional representation of the object based on the implicit representation comprises rotating the implicit representation of the object. 7. The method of claim 6 , wherein rotating the implicit representation of the object comprises performing the nearest neighbor shear rotation of the implicit representation of the object. 8. The method of claim 7 , wherein the three-dimensional representation comprises is an explicit three-dimensional representation including at least one of a voxel grid, a mesh or a point cloud. 9. The method of claim 7 , wherein the implicit representation of the object comprises a tensor or a latent space of an autoencoder. 10. The method of claim 4 , wherein generating the implicit representation of the object with the inverse rendering based on the input image comprises generating the implicit representation in a single forward pass of the inverse rendering. 11. The method of claim 1 , further comprising training the machine learning model based on the constraint of equivariance under rotations between the training object and the model-generated representation of the training object by: providing a first input training image depicting a first view of the training object to the machine learning model; providing a second input training image depicting a second view of the training object to the machine learning model; generating the first implicit representation of the training object based on the first input training image; generating the second implicit representation of the training object based on the second input training image; rotating the first implicit representation of the training object to generate the rotated version of the first implicit representation; rotating the second implicit representation of the training object to generate the rotated version of the second implicit representation; generating a first output training image based on the rotated version of the first implicit representation of the training object; generating a second output training image based on the rotated version of the second implicit representation of the training object; comparing the first input training image to the second output training image; and comparing the second input training image to the first output training image. 12. The method of claim 11 , wherein the training further comprises minimizing a loss function based on the comparing of the first input training image to the second output training image and the comparing of the second input training image to the first output training image. 13. The method of claim 12 , further comprising: comparing the first implicit representation to the rotated version of the second implicit representation; and comparing the second implicit representation to the rotated version of the first implicit representation. 14. The method of claim 13 , wherein the loss function is further based on the comparing of the first implicit representation to the rotated version of the second implicit representation and the comparing of the second implicit representation to the rotated version of first implicit representation. 15. The method of claim 1 , further comprising training the machine learning model based on at least two input training images without three-dimensional supervision of the training. 16. The method of claim 15 , further comprising testing the trained machine learning model without providing pose information to the trained machine learning model. 17. A system comprising: a processor; a memory device containing instructions, which when executed by the processor cause the processor to: provide an input image depicting a view of an object to a machine learning model, wherein the machine learning model utilizes a nearest neighbor shear rotation and has been trained based on a constraint of equivariance under rotations between a training object and a model-generated representation of the training object, wherein the nearest neighbor shear rotation comprises an invertible shear rotation of an implicit three-dimensional representation of the object in which each voxel of the implicit three-dimensional representation of the object is shifted to a unique nearest neighbor on a grid; and generate, using the machine learning model and based on the provided input image, at least one of an output image that depicts the object from a rotated view that is different from the view of the object in the input image, or a three-dimensional representation of the object. 18. The system of claim 17 , wherein a model architecture of the machine learning model, including a shear rotation module, is fully differentiable. 19. A non-transitory machine-readable medium comprising code that, when executed by a processor, causes the processor to: provide an input image depicting a view of an object to a machine learning model, wherein the machine learning model utilizes shear rotation and has been trained based on at least two input training images depicting different views of a training object and a constraint of equivariance under rotations, the constraint comprising a comparison of a first implicit representation of the training object to a rotated version of a second implicit representation of the training object and a comparison of the second implicit representation to a rot

Assignees

Apple Inc

Inventors

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/0475
Generative networks · CPC title
G06N3/0455
Auto-encoder networks; Encoder-decoder networks · CPC title
G06N3/0895
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
G06T15/205Primary
Image-based rendering · CPC title

Patent family

Related publications grouped by family.

View patent family 77178447

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11967015B2 cover?: The subject technology provides a framework for learning neural scene representations directly from images, without three-dimensional (3D) supervision, by a machine-learning model. In the disclosed systems and methods, 3D structure can be imposed by ensuring that the learned representation transforms like a real 3D scene. For example, a loss function can be provided which enforces equivariance …
Who is the assignee on this patent?: Apple Inc
What technology area does this patent fall under?: Primary CPC classification G06T15/205. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 23 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Generating synthetic images and/or training machine learning model(s) based on the synthetic images

Three-dimension (3d) assisted personalized home object detection

Learning geometric differentials for matching 3d models to objects in a 2d image

Generating novel views of a three-dimensional object based on a single two-dimensional image

Training and/or using neural network models to generate intermediary output of a spectral image

Frequently asked questions