Method, device, and computer program product for compressing two-dimensional image

US12581083B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12581083-B2
Application numberUS-202418443679-A
CountryUS
Kind codeB2
Filing dateFeb 16, 2024
Priority dateJan 25, 2024
Publication dateMar 17, 2026
Grant dateMar 17, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to a method, a device, and a computer program product for compressing a two-dimensional image. The method includes determining a plurality of importance scores of a plurality of images by a trained image compressor network according to pixel values of the plurality of images. The method further includes selecting an image subset from the plurality of images according to the plurality of importance scores of the plurality of images. In addition, the method further includes compressing the plurality of images by retaining the selected image subset and abandoning the remaining images. In this way, high image reconstruction quality is maintained while a high compression ratio is achieved. Moreover, as a manual labeling or calibration process is avoided, a large-scale data set can be processed with less manual intervention and fewer computing resources.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for compressing an image, comprising: determining a plurality of importance scores of a plurality of images by a trained image compressor network according to pixel values of the plurality of images, wherein the trained image compressor network comprises an encoder and a decoder, with an output of the encoder being coupled to an input of the decoder, the encoder being configured to encode multiple distinct portions of each of the images into a corresponding encoded sequence for that image, the decoder being configured to process the encoded sequences for the respective images to generate the importance scores of the images; selecting an image subset from the plurality of images according to the plurality of importance scores of the plurality of images; and compressing the plurality of images by retaining the selected image subset and abandoning the remaining images. 2 . The method according to claim 1 , wherein the retained image subset is used for reconstructing a three-dimensional (3D) scene. 3 . The method according to claim 1 , wherein determining the plurality of importance scores of the plurality of images comprises causing the encoder of the trained image compressor network to perform the following steps: dividing each image of the plurality of images into a plurality of pixel blocks; transforming the plurality of pixel blocks to marked sequences; feeding the marked sequences to a stack of encoder layers including one or more layers that use self-attention and feedforward operations for encoding a global feature and a local feature of each image; and outputting encoded marked sequences, wherein each image corresponds to one encoded marked sequence. 4 . The method according to claim 3 , wherein determining the plurality of importance scores of the plurality of images further comprises causing the decoder of the trained image compressor network to perform the following steps: using the encoded marked sequences as an input; decoding the features from the encoder by using a masked self-attention and cross-attention mechanism; and generating a set comprising the importance score of each image. 5 . The method according to claim 1 , further comprising: performing three-dimensional (3D) reconstruction and synthesizing on a two-dimensional (2D) image set of a 3D scene and a posture set corresponding to the 2D image set by using a 3D reconstruction model, to obtain a new 3D view. 6 . The method according to claim 5 , further comprising: determining the importance score of each image based on a contribution of each image in the 2D image set to the 3D reconstruction. 7 . The method according to claim 6 , wherein determining the importance score of each image comprises: randomly sampling a subset in the 2D image set; determining a reward difference between a case in which an image in the 2D image set is added to the subset and a case in which the image is not added to the subset; repeating the sampling and the determining of the reward difference one or more times; and calculating the average of obtained results to obtain an estimation value of the importance score of the image. 8 . The method according to claim 1 , further comprising: creating a new data set comprising an image tuple, a position tuple, and an importance tuple. 9 . The method according to claim 8 , further comprising: training the image compressor network by using the created new data set. 10 . The method according to claim 9 , wherein training the image compressor network comprises: minimizing a total loss function by using a gradient descent method; wherein the total loss function is a weighted sum of a position loss function and an importance loss function; wherein the position loss function is determined based on a quantity of images, a true value of a camera position, and an estimation value for the camera position obtained by the image compressor network; and wherein the importance loss function is determined based on the quantity of images, the estimation value of the importance score, and an estimation value for the importance score of the image obtained by the image compressor network. 11 . An electronic device, comprising: at least one processor; and memory coupled to the at least one processor and having instructions stored therein, wherein the instructions, when executed by the at least one processor, cause the electronic device to perform actions comprising: determining a plurality of importance scores of a plurality of images by a trained image compressor network according to pixel values of the plurality of images, wherein the trained image compressor network comprises an encoder and a decoder, with an output of the encoder being coupled to an input of the decoder, the encoder being configured to encode multiple distinct portions of each of the images into a corresponding encoded sequence for that image, the decoder being configured to process the encoded sequences for the respective images to generate the importance scores of the images; selecting an image subset from the plurality of images according to the plurality of importance scores of the plurality of images; and compressing the plurality of images by retaining the selected image subset and abandoning the remaining images. 12 . The electronic device according to claim 11 , wherein the retained image subset is used for reconstructing a three-dimensional (3D) scene. 13 . The electronic device according to claim 11 , wherein determining the plurality of importance scores of the plurality of images comprises causing the encoder of the trained image compressor network to perform the following steps: dividing each image of the plurality of images into a plurality of pixel blocks; transforming the plurality of pixel blocks to marked sequences; feeding the marked sequences to a stack of encoder layers including one or more layers that use self-attention and feedforward operations for encoding a global feature and a local feature of each image; and outputting encoded marked sequences, wherein each image corresponds to one encoded marked sequence. 14 . The electronic device according to claim 13 , wherein determining the plurality of importance scores of the plurality of images further comprises causing the decoder of the trained image compressor network to perform the following steps: using the encoded marked sequences as an input; decoding the features from the encoder by using a masked self-attention and cross-attention mechanism; and generating a set comprising the importance score of each image. 15 . The electronic device according to claim 11 , wherein the actions further comprise: performing three-dimensional (3D) reconstruction and synthesizing on a two-dimensional (2D) image set of a 3D scene and a posture set corresponding to the 2D image set by using a 3D reconstruction model, to obtain a new 3D view. 16 . The electronic device according to claim 15 , wherein the actions further comprise: determining the importance score of each image based on a contribution of each image in the 2D image set to 3D reconstruction. 17 . The electronic device according to claim 16 , wherein determining the importance score of each image comprises: randomly sampling a subset in the 2D image set; determining a reward difference between a case in which an image in the 2D image set is added to the subset and a case in which the image is not added to the subset; repeating the sampling and the determining of the reward difference one or more times; and calcu

Assignees

Inventors

Classifications

  • the region being a block, e.g. a macroblock · CPC title

  • characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation (H04N19/635 takes precedence) · CPC title

  • G06T17/00Primary

    Three-dimensional [3D] modelling for computer graphics · CPC title

  • H04N19/119Primary

    Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks · CPC title

  • specially adapted for multi-view video sequence encoding · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12581083B2 cover?
The present disclosure relates to a method, a device, and a computer program product for compressing a two-dimensional image. The method includes determining a plurality of importance scores of a plurality of images by a trained image compressor network according to pixel values of the plurality of images. The method further includes selecting an image subset from the plurality of images accord…
Who is the assignee on this patent?
Dell Products Lp
What technology area does this patent fall under?
Primary CPC classification G06T17/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 17 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).