Image classification method and apparatus
US-2025265808-A1 · Aug 21, 2025 · US
US12561846B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12561846-B2 |
| Application number | US-202318371730-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 22, 2023 |
| Priority date | Mar 31, 2021 |
| Publication date | Feb 24, 2026 |
| Grant date | Feb 24, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus and method for coding machine vision data using a reduction of feature map are disclosed. To reduce the size of a feature map extracted by a machine task-specialized deep learning model, a Video Coding for Machines (VCM) coding apparatus and a method are provided. The VCM coding apparatus and the method utilize a sparsification method that reduces redundancy in terms of space and channels of the feature map, and the VCM coding apparatus and the method also utilize a feature map decomposition method based on tensor decomposition.
Opening claim text (preview).
What is claimed is: 1 . An encoding method performed by a machine vision encoding apparatus for encoding a feature map, the encoding method comprising: extracting the feature map from an input image using a deep learning model, wherein the feature map is generated from an intermediate layer of the deep learning model; generating a reduced feature map by reducing a size of the feature map; generating a converted feature map by converting a data type of the reduced feature map and rearranging the reduced feature map; and generating a bitstream by encoding the converted feature map using a video encoder. 2 . The encoding method of claim 1 , wherein the feature map comprises: as many 2D feature maps having an equal height and an equal width as there are channels. 3 . The encoding method of claim 1 , wherein generating the reduced feature map comprises: reducing the feature map based on a feature map sparsification in terms of space or channels of the feature map. 4 . The encoding method of claim 3 , wherein generating the reduced feature map comprises: reducing the feature map with the feature map sparsification and a tensor decomposition combined. 5 . The encoding method of claim 3 , wherein generating the reduced feature map comprises: when a 2D feature map constituting the feature map has a region with a feature value that is less than a preset threshold, setting the feature value of the region to zero. 6 . The encoding method of claim 3 , wherein generating the reduced feature map comprises: calculating a distance between two 2D feature maps having a preset channel stride; selecting all pairs of 2D feature maps having a distance that is less than a preset threshold; and for each of the selected all pairs, sparsifying all values of one 2D feature map to zero or deleting the one 2D feature map. 7 . The encoding method of claim 6 , wherein generating the bitstream comprises: encoding a sparsified 2D feature map and the preset channel stride when the one 2D feature map is sparsified; or when the one 2D feature map is deleted, encoding an index of the deleted 2D feature map and the preset channel stride. 8 . The encoding method of claim 1 , wherein generating the reduced feature map comprises: reducing the feature map based on tensor decomposition. 9 . The encoding method of claim 8 , wherein generating the reduced feature map comprises: decomposing the feature map into one kernel tensor and three factor matrices by using a Tucker decomposition. 10 . The encoding method of claim 8 , wherein generating the reduced feature map comprises: decomposing the feature map into P rank 1 tensors (wherein P is a natural number) by using a Canonical Polyadic (CP) decomposition. 11 . A decoding method performed by a machine vision decoding apparatus, the decoding method comprising: decoding a converted feature map using a video decoder from a bitstream; reconstructing a reduced feature map by rearranging the converted feature map and by converting a data type of the rearranged converted feature map; and generating a reconstructed feature map by expanding a size of the reduced feature map, wherein the reconstructed feature map corresponds to a feature map generated from an intermediate layer of a deep learning model in a machine vision encoding apparatus. 12 . The decoding method of claim 11 , wherein the reconstructed feature map comprises: as many 2D feature maps having an equal height and an equal width as there are channels. 13 . The decoding method of claim 11 , further comprising: decoding a preset channel stride and a sparsified 2D feature map, or decoding the preset channel stride and an index of a deleted 2D feature map, when the reduced feature map is reduced based on feature map sparsification in terms of channels. 14 . The decoding method of claim 13 , wherein generating the reconstructed feature map comprises: with respect to the sparsified 2D feature map, generating the reconstructed feature map by copying a reconstructed 2D feature map before or after the preset channel stride to a location of the sparsified 2D feature map, or with respect to the deleted 2D feature map, copying, with reference to a decoded index, a reconstructed 2D feature map before or after the preset channel stride to a location of the deleted 2D feature map. 15 . The decoding method of claim 11 , wherein generating the reconstructed feature map comprises: when the reduced feature map has been reduced by using a Tucker decomposition, generating the reconstructed feature map by using a kernel tensor and factor matrices constituting the reduced feature map. 16 . The decoding method of claim 11 , wherein generating the reconstructed feature map comprises: when the reduced feature map has been reduced by using a Canonical Polyadic (CP) decomposition, generating the reconstructed feature map by using P rank 1 tensors (wherein P is a natural number) constituting the reduced feature map. 17 . A computer-readable recording medium storing a bitstream generated by a machine vision encoding method for encoding a feature map, the machine vision encoding method comprising: extracting the feature map from an input image using a deep learning model, wherein the feature map is generated from an intermediate layer of the deep learning model; generating a reduced feature map by reducing a size of the feature map; generating a converted feature map by converting a data type of the reduced feature map and rearranging the reduced feature map; and generating a bitstream by encoding the converted feature map using a video encoder.
using neural networks · CPC title
Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.