Method of selecting 2d input image, image processing apparatus, and image reconstruction apparatus for 3d reconstruction
US-2024212263-A1 · Jun 27, 2024 · US
US12581083B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12581083-B2 |
| Application number | US-202418443679-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 16, 2024 |
| Priority date | Jan 25, 2024 |
| Publication date | Mar 17, 2026 |
| Grant date | Mar 17, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure relates to a method, a device, and a computer program product for compressing a two-dimensional image. The method includes determining a plurality of importance scores of a plurality of images by a trained image compressor network according to pixel values of the plurality of images. The method further includes selecting an image subset from the plurality of images according to the plurality of importance scores of the plurality of images. In addition, the method further includes compressing the plurality of images by retaining the selected image subset and abandoning the remaining images. In this way, high image reconstruction quality is maintained while a high compression ratio is achieved. Moreover, as a manual labeling or calibration process is avoided, a large-scale data set can be processed with less manual intervention and fewer computing resources.
Opening claim text (preview).
What is claimed is: 1 . A method for compressing an image, comprising: determining a plurality of importance scores of a plurality of images by a trained image compressor network according to pixel values of the plurality of images, wherein the trained image compressor network comprises an encoder and a decoder, with an output of the encoder being coupled to an input of the decoder, the encoder being configured to encode multiple distinct portions of each of the images into a corresponding encoded sequence for that image, the decoder being configured to process the encoded sequences for the respective images to generate the importance scores of the images; selecting an image subset from the plurality of images according to the plurality of importance scores of the plurality of images; and compressing the plurality of images by retaining the selected image subset and abandoning the remaining images. 2 . The method according to claim 1 , wherein the retained image subset is used for reconstructing a three-dimensional (3D) scene. 3 . The method according to claim 1 , wherein determining the plurality of importance scores of the plurality of images comprises causing the encoder of the trained image compressor network to perform the following steps: dividing each image of the plurality of images into a plurality of pixel blocks; transforming the plurality of pixel blocks to marked sequences; feeding the marked sequences to a stack of encoder layers including one or more layers that use self-attention and feedforward operations for encoding a global feature and a local feature of each image; and outputting encoded marked sequences, wherein each image corresponds to one encoded marked sequence. 4 . The method according to claim 3 , wherein determining the plurality of importance scores of the plurality of images further comprises causing the decoder of the trained image compressor network to perform the following steps: using the encoded marked sequences as an input; decoding the features from the encoder by using a masked self-attention and cross-attention mechanism; and generating a set comprising the importance score of each image. 5 . The method according to claim 1 , further comprising: performing three-dimensional (3D) reconstruction and synthesizing on a two-dimensional (2D) image set of a 3D scene and a posture set corresponding to the 2D image set by using a 3D reconstruction model, to obtain a new 3D view. 6 . The method according to claim 5 , further comprising: determining the importance score of each image based on a contribution of each image in the 2D image set to the 3D reconstruction. 7 . The method according to claim 6 , wherein determining the importance score of each image comprises: randomly sampling a subset in the 2D image set; determining a reward difference between a case in which an image in the 2D image set is added to the subset and a case in which the image is not added to the subset; repeating the sampling and the determining of the reward difference one or more times; and calculating the average of obtained results to obtain an estimation value of the importance score of the image. 8 . The method according to claim 1 , further comprising: creating a new data set comprising an image tuple, a position tuple, and an importance tuple. 9 . The method according to claim 8 , further comprising: training the image compressor network by using the created new data set. 10 . The method according to claim 9 , wherein training the image compressor network comprises: minimizing a total loss function by using a gradient descent method; wherein the total loss function is a weighted sum of a position loss function and an importance loss function; wherein the position loss function is determined based on a quantity of images, a true value of a camera position, and an estimation value for the camera position obtained by the image compressor network; and wherein the importance loss function is determined based on the quantity of images, the estimation value of the importance score, and an estimation value for the importance score of the image obtained by the image compressor network. 11 . An electronic device, comprising: at least one processor; and memory coupled to the at least one processor and having instructions stored therein, wherein the instructions, when executed by the at least one processor, cause the electronic device to perform actions comprising: determining a plurality of importance scores of a plurality of images by a trained image compressor network according to pixel values of the plurality of images, wherein the trained image compressor network comprises an encoder and a decoder, with an output of the encoder being coupled to an input of the decoder, the encoder being configured to encode multiple distinct portions of each of the images into a corresponding encoded sequence for that image, the decoder being configured to process the encoded sequences for the respective images to generate the importance scores of the images; selecting an image subset from the plurality of images according to the plurality of importance scores of the plurality of images; and compressing the plurality of images by retaining the selected image subset and abandoning the remaining images. 12 . The electronic device according to claim 11 , wherein the retained image subset is used for reconstructing a three-dimensional (3D) scene. 13 . The electronic device according to claim 11 , wherein determining the plurality of importance scores of the plurality of images comprises causing the encoder of the trained image compressor network to perform the following steps: dividing each image of the plurality of images into a plurality of pixel blocks; transforming the plurality of pixel blocks to marked sequences; feeding the marked sequences to a stack of encoder layers including one or more layers that use self-attention and feedforward operations for encoding a global feature and a local feature of each image; and outputting encoded marked sequences, wherein each image corresponds to one encoded marked sequence. 14 . The electronic device according to claim 13 , wherein determining the plurality of importance scores of the plurality of images further comprises causing the decoder of the trained image compressor network to perform the following steps: using the encoded marked sequences as an input; decoding the features from the encoder by using a masked self-attention and cross-attention mechanism; and generating a set comprising the importance score of each image. 15 . The electronic device according to claim 11 , wherein the actions further comprise: performing three-dimensional (3D) reconstruction and synthesizing on a two-dimensional (2D) image set of a 3D scene and a posture set corresponding to the 2D image set by using a 3D reconstruction model, to obtain a new 3D view. 16 . The electronic device according to claim 15 , wherein the actions further comprise: determining the importance score of each image based on a contribution of each image in the 2D image set to 3D reconstruction. 17 . The electronic device according to claim 16 , wherein determining the importance score of each image comprises: randomly sampling a subset in the 2D image set; determining a reward difference between a case in which an image in the 2D image set is added to the subset and a case in which the image is not added to the subset; repeating the sampling and the determining of the reward difference one or more times; and calcu
the region being a block, e.g. a macroblock · CPC title
characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation (H04N19/635 takes precedence) · CPC title
Three-dimensional [3D] modelling for computer graphics · CPC title
Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks · CPC title
specially adapted for multi-view video sequence encoding · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.