Virtual-environment-based object construction method and apparatus, computer device, and computer-readable storage medium
US-12059615-B2 · Aug 13, 2024 · US
US12469217B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12469217-B2 |
| Application number | US-202218090657-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 29, 2022 |
| Priority date | Dec 29, 2022 |
| Publication date | Nov 11, 2025 |
| Grant date | Nov 11, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An environment synthesis framework generates virtual environments from a synthesized two-dimensional (2D) satellite map of a geographic area, a three-dimensional (3D) voxel environment, and a voxel-based neural rendering framework. In an example implementation, the synthesized 2D satellite map is generated by a map synthesis generative adversarial network (GAN) which is trained using sample city datasets. The multi-stage framework lifts the 2D map into a set of 3D octrees, generates an octree-based 3D voxel environment, and then converts it into a texturized 3D virtual environment using a neural rendering GAN and a set of pseudo ground truth images. The resulting 3D virtual environment is texturized, lifelike, editable, traversable in virtual reality (VR) and augmented reality (AR) experiences, and very large in scale.
Opening claim text (preview).
What is claimed is: 1 . A framework for generating virtual environments, the framework comprising: a city dataset comprising a set of synthesized two-dimensional (2D) map images associated with a geographic area, wherein the city dataset comprises one or more of a plurality of street view images, a CAD model, and a plurality of GPS-registered camera images; an infinite-pixel image synthesis module operative to utilize a map synthesis generative adversarial network (GAN) and the set of synthesized 2D map images, wherein the infinite-pixel image synthesis module is operative to generate a synthesized two-dimensional (2D) satellite map associated with the geographic area; an octree-based voxel completion module operative to generate a set of octrees based on the set of synthesized 2D map images, to generate an octree-based voxel representation based on the synthesized 2D satellite map, and to convert the octree-based voxel representation into a three-dimensional (3D) voxel environment based on the set of octrees; and a voxel-based neural rendering framework operative to generate a 3D virtual environment based on the 3D voxel environment, such that the 3D virtual environment resembles the geographic area. 2 . The framework of claim 1 , further comprising: a dataset pre-processing module operative to generate the set of synthesized 2D map images and to generate the set of octrees based on the city dataset; and an octree completion module that is operative to convert the octree-based voxel representation into the 3D voxel environment in accordance with the set of octrees. 3 . The framework of claim 1 , wherein the infinite-pixel image synthesis module further comprises: a neural implicit generator in operative communication with a patch contrastive discriminator, wherein the infinite-pixel image synthesis module is operative to train the neural implicit generator and the patch contrastive discriminator using the set of synthesized 2D map images. 4 . The framework of claim 1 , further comprising: a pseudo ground truth synthesis module that is operative to generate a set of pseudo ground truth images in accordance with an image ground truth pre-training generative adversarial network (GAN), wherein the voxel-based neural rendering framework further comprises a neural rendering framework that is operative to generate the 3D virtual environment in accordance with the set of pseudo ground truth images. 5 . The framework of claim 4 , wherein the pseudo ground truth synthesis module further comprises: a voxel renderer operative to generate a set of rendered images; and a SPADE generator in operative communication with the image ground truth pre-training GAN and the voxel renderer, such that the set of pseudo ground truth images is based on the set of rendered images. 6 . The framework of claim 4 , wherein the voxel-based neural rendering framework further comprises: a neural rendering generator in operative communication with a neural rendering discriminator, wherein the neural rendering generator and the neural rendering discriminator are trained using the set of pseudo ground truth images. 7 . The framework of claim 6 , wherein the 3D voxel environment comprises a plurality of features, and wherein the voxel-based neural rendering framework further comprises: a ray sampling tool in operative communication between the neural rendering generator and the 3D voxel environment, such that the neural rendering generator during training retrieves one or more of the plurality features associated with the 3D voxel environment. 8 . The framework of claim 1 , further comprising: a voxel renderer operative to generate a set of rendered images; a SPADE generator in operative communication with a SPADE discriminator, wherein the SPADE generator and the SPADE discriminator are trained using the set of rendered images; a street view renderer operative to generate a set of segmentation images: an image ground truth pre-training GAN in communication with the SPADE generator, wherein the image ground truth pre-training GAN is operative to use paired data to further train the SPADE generator and the SPADE discriminator, wherein the paired data comprises the plurality of GPS-registered camera images and the set of segmentation images. 9 . A method of generating virtual environments, comprising: accessing a city dataset comprising a set of synthesized two-dimensional (2D) map images associated with a geographic area, wherein the city dataset comprises one or more of a plurality of street view images, a CAD model, and a plurality of GPS-registered camera images; generating a synthesized two-dimensional (2D) satellite map associated with the geographic area utilizing a map synthesis generative adversarial network (GAN) and the set of synthesized 2D map images; generating a set of octrees based on the set of synthesized 2D map images; generating an octree-based voxel representation based on the synthesized 2D satellite map; and converting the octree-based voxel representation into a three-dimensional (3D) voxel environment based on the set of octrees; and generating a 3D virtual environment based on the 3D voxel environment, such that the 3D virtual environment resembles the geographic area. 10 . The method of claim 9 , further comprising: training, using the set of synthesized 2D map images, a neural implicit generator in operative communication with a patch contrastive discriminator. 11 . The method of claim 9 , further comprising: generating a set of pseudo ground truth images in accordance with an image ground truth pre-training generative adversarial network (GAN); and generating the 3D virtual environment, using a neural rendering framework, in accordance with the set of pseudo ground truth images. 12 . The method of claim 11 , further comprising: generating a set of rendered images using a voxel renderer; and training, using the set of rendered images, a SPADE generator in operative communication with the image ground truth pre-training GAN, such that the set of pseudo ground truth images is based on the set of rendered images. 13 . The method of claim 11 , further comprising: training, using the set of pseudo ground truth images, a neural rendering generator in operative communication with a neural rendering discriminator. 14 . The method of claim 13 , wherein the 3D voxel environment comprises a plurality of features, and wherein the method further comprises: retrieving one or more of the plurality of features associated with the 3D voxel environment using a ray sampling tool, wherein the ray sampling tool is in operative communication between the neural rendering generator and the 3D voxel environment. 15 . The method of claim 9 , further comprising: generating a set of rendered images using a voxel renderer; and training a SPADE generator in operative communication with a SPADE discriminator using the set of rendered images, wherein the SPADE generator is in operative communication with an image ground truth pre-training GAN; generating a set of segmentation images using a street view renderer; training the SPADE generator and the SPADE discriminator using paired data, wherein the paired data comprises the plurality of GPS-registered camera images and the set of segmentation images. 16 . A non-transitory computer-readable medium including instructions for generating virtual environments, wherein the instructions, when executed by a processor, configure the processor to perform functions, including functions to: access a city dataset compri
involving the use of two or more images · CPC title
Artificial neural networks [ANN] · CPC title
Training; Learning · CPC title
Tree description, e.g. octree, quadtree · CPC title
Geographic models · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.