Systems and methods for generic visual odometry using learned features via neural camera models
US-2022084231-A1 · Mar 17, 2022 · US
US12039657B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12039657-B2 |
| Application number | US-202117204571-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 17, 2021 |
| Priority date | Mar 17, 2021 |
| Publication date | Jul 16, 2024 |
| Grant date | Jul 16, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of the technology described herein, provide a view and time synthesis of dynamic scenes captured by a camera. The technology described herein represents a dynamic scene as a continuous function of both space and time. The technology may parameterize this function with a deep neural network (a multi-layer perceptron (MLP)), and perform rendering using volume tracing. At a very high level, a dynamic scene depicted in the video may be used to train the MLP. Once trained, the MLP is able to synthesize a view of the scene at a time and/or camera pose not found in the video through prediction. As used herein, a dynamic scene comprises one or more moving objects.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for synthesizing a view of a dynamic scene, the method comprising: generating a motion estimate for a dynamic element in a dynamic scene depicted in a video; using a neural network to represent the dynamic scene depicted in the video, the neural network comprising a first neural network for representing static elements of the dynamic scene and a second neural network for representing dynamic elements of the dynamic scene; training the second neural network using the motion estimate; receiving from the neural network a set of colors and densities representing the dynamic scene from a novel camera view, wherein the set of colors and densities are derived from a first output from the first neural network and a second output from the second neural network linearly blended based on a blending weight, the blending weight derived as a function of training convergence; and rendering the novel camera view of the dynamic scene from the set of colors and densities, wherein the blending weight causes the first output of the first neural network to be used to render the static elements of the dynamic scene and the second output of the second neural network to be used to render the dynamic elements of the dynamic scene. 2. The method of claim 1 , wherein the rendering comprises accumulating color values from different scene depths for each pixel being rendered. 3. The method of claim 1 , wherein a representation of the dynamic scene in the second neural network comprises six dimensions and wherein one of the six dimensions is time. 4. The method of claim 1 , wherein the method further comprises receiving from the neural network a 3D scene flow and rendering the novel camera view of the dynamic scene using the 3D scene flow. 5. The method of claim 1 , wherein the method further comprises training the neural network to represent the dynamic scene depicted in the video through use of a photoconsistency loss to enforce, during training, that a scene representation is temporally consistent with the video. 6. The method of claim 1 , wherein the method further comprises training the neural network using a depth estimation of the dynamic scene to encourage an expected termination depth computed along a ray to be close to the depth estimation. 7. One or more computer-storage media having computer-executable instructions embodied thereon that, when executed by a computing system having a processor and memory, cause the processor to perform operations: generating a motion estimate for a dynamic element in a dynamic scene depicted in a video; using a neural network to hold a six dimensional representation of the dynamic scene depicted in the video, wherein the neural network comprises a first neural network for representing static elements of the dynamic scene in a five dimensional representation and a second neural network for representing dynamic elements of the dynamic scene in the six dimensional representation, and wherein one of six dimensions in the six dimensional representation is time; training the second neural network using the motion estimate; receiving from the neural network a 3D scene flow indicative of an object movement within the dynamic scene; receiving from the neural network a set of colors and densities representing the dynamic scene from a novel camera view, wherein the set of colors and densities are derived from a first output from the first neural network and a second output from the second neural network linearly blended based on a blending weight, the blending weight derived as a function of training convergence; and rendering a view of the dynamic scene from the 3D scene flow, wherein the blending weight causes the first output from the first neural network to be used to render the static elements in the dynamic scene and the second output from the second neural network to be used to render the dynamic elements in the dynamic scene. 8. The media of claim 7 , wherein the rendering also uses the blending weight, the first set of colors and densities, and the second set of colors and densities, wherein the rendering comprises accumulating color values from different scene depths for each pixel being rendered. 9. The media of claim 7 , wherein the operations further comprise training the neural network using camera pose data for the video. 10. The media of claim 7 , wherein the operations further comprise training the neural network using a depth estimation of the dynamic scene that encourages an expected termination depth computed along a ray to be close to the depth estimation. 11. The media of claim 7 , wherein operations further comprise training the neural network using a photoconsistency loss to enforce, during training, that a scene representation is temporally consistent with the video. 12. A method of synthesizing a view of a dynamic scene comprising: generating a motion estimate for a dynamic element in a dynamic scene depicted in a video; using a neural network to hold a representation of the dynamic scene depicted in the video wherein the neural network comprises a first neural network for representing static elements of the dynamic scene in a five dimensional representation and a second neural network for representing dynamic elements of the dynamic scene in the six dimensional representation, and; training the second neural network using the motion estimate; receiving from the neural network a 3D scene flow indicative of an object movement within the dynamic scene; receiving from the neural network a set of colors and densities representing the dynamic scene from a novel camera view, wherein the set of colors and densities are derived from a first output from the first neural network and a second output from the second neural network linearly blended based on a blending weight derived as a function of training convergence between the first neural network and the second neural network; and rendering the novel camera view of the dynamic scene from the 3D scene flow and the set of colors and densities, wherein the blending weight causes the first output from the first neural network to be used to render the static elements of the dynamic scene and the second output of the second neural network to be used to render the dynamic elements of the dynamic scene. 13. The method of claim 12 , wherein the rendering comprises accumulating color values from different scene depths for each pixel being rendered. 14. The method of claim 12 , wherein the method further comprises training the neural network using a depth estimation of the dynamic scene to encourage an expected termination depth computed along a ray to be close to the depth estimation. 15. The method of claim 12 , wherein the method further comprises training the neural network using a photoconsistency loss to enforce, during training, that a scene representation is temporally consistent with the video.
Supervised learning · CPC title
Feedforward networks · CPC title
Illumination models · CPC title
Video; Image sequence · CPC title
Camera pose · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.