Synthesizing three-dimensional shapes using latent diffusion models in content generation systems and applications
US-2024005604-A1 · Jan 4, 2024 · US
US12567177B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12567177-B2 |
| Application number | US-202318132751-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 10, 2023 |
| Priority date | Mar 10, 2023 |
| Publication date | Mar 3, 2026 |
| Grant date | Mar 3, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Illustrative embodiments of the present disclosure include a method, an electronic device, and a computer program product for image processing. The method includes extracting a first hidden vector in a first image and acquiring first grid data associated with the first image, wherein the first grid data corresponds to pixel positions of the first image. The method further includes encoding the first hidden vector to acquire first encoded data, generating first inference data based on the first grid data and the first encoded data, decoding the first inference data to generate first decoded data, and generating a second image based on the first decoded data and the first image, wherein the second image has a higher resolution than the first image. The method can continuously enhance image quality by exploring potential correlations between pixels, and can provide powerful and arbitrary super resolution processing capabilities for up sampled images.
Opening claim text (preview).
What is claimed is: 1 . A method for image processing, comprising: extracting a first hidden vector in a first image; acquiring first grid data associated with the first image, wherein the first grid data corresponds to pixel positions of the first image; encoding the first hidden vector to acquire first encoded data; generating first inference data based on the first grid data and the first encoded data, wherein the first inference data is generated at least in part utilizing first sampling data obtained by grid sampling the first grid data in combination with random data obtained from random sampling; decoding the first inference data to generate first decoded data; and generating a second image based on the first decoded data and the first image, the second image having a higher resolution than the first image; wherein generating the first inference data based on the first grid data and the first encoded data comprises: adding the random data from the random sampling to the first sampling data to generate first image random data; decomposing the first encoded data into a plurality of first data blocks; merging the plurality of first data blocks, the first sampling data, and the first image random data to generate a plurality of first merged data blocks; and stacking the plurality of first merged data blocks to form the first inference data. 2 . The method according to claim 1 , further comprising: high-frequency encoding the first grid data to generate second grid data, the second grid data having high-frequency data in the first grid data; generating second inference data based on the second grid data and the first encoded data; decoding the second inference data to generate second decoded data; and generating a third image based on the second decoded data and the second image, the third image having a higher resolution than the second image. 3 . The method according to claim 1 , further comprising performing a training process, the training process comprising: extracting an original hidden vector of an original image; extracting a first training hidden vector of a first training image, wherein the original image is associated with the first training image, and the original image has a higher resolution than the first training image; acquiring first training grid data associated with the first training image, wherein the first training grid data corresponds to pixel positions of the first training image; calculating first residual data between the original hidden vector and the first training hidden vector; encoding the first residual data to generate first residual encoded data; encoding the first training hidden vector to acquire first training encoded data; generating first training inference data based on the first training grid data, the first training encoded data, and the first residual data; decoding the first training inference data to generate first training decoded data; and generating a second training image based on the first training decoded data and the first training image, the second training image having a higher resolution than the first training image. 4 . The method according to claim 3 , wherein the training process further comprises: extracting a second training hidden vector of the second training image; calculating second residual data between the original hidden vector and the second training hidden vector; encoding the second residual data to generate second residual encoded data; high-frequency encoding the first training grid data to generate second training grid data, the second training grid data having high-frequency data in the first training grid data; generating second training inference data based on the second training grid data, the first training encoded data, and the second residual encoded data; decoding the second training inference data to generate second training decoded data; and generating a third training image based on the second training decoded data and the second training image, the third training image having a higher resolution than the second training image. 5 . The method according to claim 3 , wherein generating first training inference data based on the first training grid data, the first training encoded data, and the first residual data comprises: grid sampling the first training grid data to generate first training grid sampling data; decomposing the first residual encoded data into a plurality of first residual encoded data blocks; sampling the plurality of first residual encoded data blocks to generate a plurality of first residual encoded sampling data blocks; merging the first training grid sampling data with the plurality of first residual encoded sampling data blocks to generate a plurality of first training merged data blocks; and adding random data from random sampling to the plurality of first training merged data blocks to generate a plurality of first image random data blocks. 6 . The method according to claim 5 , further comprising: decomposing the first training encoded data into a plurality of first training encoded data blocks; and combining and stacking the plurality of first image random data blocks, the plurality of first training encoded data blocks, and the first training grid sampling data to form the first training inference data. 7 . The method according to claim 5 , further comprising: generating the first training encoded data by an encoder; generating the first training inference data by an inference module; generating the first training decoded data by a decoder; comparing image resolution of the second training image with that of the original image to acquire an image resolution result; performing the training process on the encoder, the inference module, and the decoder in response to the image resolution result being lower than a resolution threshold; and stopping the training process for the encoder, the inference module, and the decoder in response to the image resolution result being higher than the resolution threshold. 8 . The method according to claim 1 , further comprising: decomposing the first encoded data into the plurality of first data blocks of a same size based on a length of the first encoded data. 9 . An electronic device, comprising: at least one processor; and memory coupled to the at least one processor and having instructions stored therein, wherein the instructions, when executed by the at least one processor, cause the electronic device to perform actions comprising: extracting a first hidden vector in a first image; acquiring first grid data associated with the first image, wherein the first grid data corresponds to pixel positions of the first image; encoding the first hidden vector to acquire first encoded data; generating first inference data based on the first grid data and the first encoded data, wherein the first inference data is generated at least in part utilizing first sampling data obtained by grid sampling the first grid data in combination with random data obtained from random sampling; decoding the first inference data to generate first decoded data; and generating a second image based on the first decoded data and the first image, the second image having a higher resolution than the first image; wherein generating the first inference data based on the first grid data and the first encoded data comprises: adding the random data from the random sampling to the first sampling data to generate first image random data; decomposing the first encoded data into a plurality of first data blocks; merging the plurality of first data blocks, the first sampling data, and the first image random data to generate a plura
using two or more images, e.g. averaging or subtraction · CPC title
Proximity, similarity or dissimilarity measures · CPC title
Image fusion; Image merging · CPC title
based on super-resolution, i.e. the output image resolution being higher than the sensor resolution · CPC title
Image coding (bandwidth or redundancy reduction for static pictures H04N1/41; coding or decoding of static colour picture signals H04N1/64; methods or arrangements for coding, decoding, compressing or decompressing digital video signals H04N19/00) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.