Method and apparatus for coding machine vision data using feature map reduction

US12561846B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12561846-B2
Application numberUS-202318371730-A
CountryUS
Kind codeB2
Filing dateSep 22, 2023
Priority dateMar 31, 2021
Publication dateFeb 24, 2026
Grant dateFeb 24, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus and method for coding machine vision data using a reduction of feature map are disclosed. To reduce the size of a feature map extracted by a machine task-specialized deep learning model, a Video Coding for Machines (VCM) coding apparatus and a method are provided. The VCM coding apparatus and the method utilize a sparsification method that reduces redundancy in terms of space and channels of the feature map, and the VCM coding apparatus and the method also utilize a feature map decomposition method based on tensor decomposition.

First claim

Opening claim text (preview).

What is claimed is: 1 . An encoding method performed by a machine vision encoding apparatus for encoding a feature map, the encoding method comprising: extracting the feature map from an input image using a deep learning model, wherein the feature map is generated from an intermediate layer of the deep learning model; generating a reduced feature map by reducing a size of the feature map; generating a converted feature map by converting a data type of the reduced feature map and rearranging the reduced feature map; and generating a bitstream by encoding the converted feature map using a video encoder. 2 . The encoding method of claim 1 , wherein the feature map comprises: as many 2D feature maps having an equal height and an equal width as there are channels. 3 . The encoding method of claim 1 , wherein generating the reduced feature map comprises: reducing the feature map based on a feature map sparsification in terms of space or channels of the feature map. 4 . The encoding method of claim 3 , wherein generating the reduced feature map comprises: reducing the feature map with the feature map sparsification and a tensor decomposition combined. 5 . The encoding method of claim 3 , wherein generating the reduced feature map comprises: when a 2D feature map constituting the feature map has a region with a feature value that is less than a preset threshold, setting the feature value of the region to zero. 6 . The encoding method of claim 3 , wherein generating the reduced feature map comprises: calculating a distance between two 2D feature maps having a preset channel stride; selecting all pairs of 2D feature maps having a distance that is less than a preset threshold; and for each of the selected all pairs, sparsifying all values of one 2D feature map to zero or deleting the one 2D feature map. 7 . The encoding method of claim 6 , wherein generating the bitstream comprises: encoding a sparsified 2D feature map and the preset channel stride when the one 2D feature map is sparsified; or when the one 2D feature map is deleted, encoding an index of the deleted 2D feature map and the preset channel stride. 8 . The encoding method of claim 1 , wherein generating the reduced feature map comprises: reducing the feature map based on tensor decomposition. 9 . The encoding method of claim 8 , wherein generating the reduced feature map comprises: decomposing the feature map into one kernel tensor and three factor matrices by using a Tucker decomposition. 10 . The encoding method of claim 8 , wherein generating the reduced feature map comprises: decomposing the feature map into P rank 1 tensors (wherein P is a natural number) by using a Canonical Polyadic (CP) decomposition. 11 . A decoding method performed by a machine vision decoding apparatus, the decoding method comprising: decoding a converted feature map using a video decoder from a bitstream; reconstructing a reduced feature map by rearranging the converted feature map and by converting a data type of the rearranged converted feature map; and generating a reconstructed feature map by expanding a size of the reduced feature map, wherein the reconstructed feature map corresponds to a feature map generated from an intermediate layer of a deep learning model in a machine vision encoding apparatus. 12 . The decoding method of claim 11 , wherein the reconstructed feature map comprises: as many 2D feature maps having an equal height and an equal width as there are channels. 13 . The decoding method of claim 11 , further comprising: decoding a preset channel stride and a sparsified 2D feature map, or decoding the preset channel stride and an index of a deleted 2D feature map, when the reduced feature map is reduced based on feature map sparsification in terms of channels. 14 . The decoding method of claim 13 , wherein generating the reconstructed feature map comprises: with respect to the sparsified 2D feature map, generating the reconstructed feature map by copying a reconstructed 2D feature map before or after the preset channel stride to a location of the sparsified 2D feature map, or with respect to the deleted 2D feature map, copying, with reference to a decoded index, a reconstructed 2D feature map before or after the preset channel stride to a location of the deleted 2D feature map. 15 . The decoding method of claim 11 , wherein generating the reconstructed feature map comprises: when the reduced feature map has been reduced by using a Tucker decomposition, generating the reconstructed feature map by using a kernel tensor and factor matrices constituting the reduced feature map. 16 . The decoding method of claim 11 , wherein generating the reconstructed feature map comprises: when the reduced feature map has been reduced by using a Canonical Polyadic (CP) decomposition, generating the reconstructed feature map by using P rank 1 tensors (wherein P is a natural number) constituting the reduced feature map. 17 . A computer-readable recording medium storing a bitstream generated by a machine vision encoding method for encoding a feature map, the machine vision encoding method comprising: extracting the feature map from an input image using a deep learning model, wherein the feature map is generated from an intermediate layer of the deep learning model; generating a reduced feature map by reducing a size of the feature map; generating a converted feature map by converting a data type of the reduced feature map and rearranging the reduced feature map; and generating a bitstream by encoding the converted feature map using a video encoder.

Assignees

Inventors

Classifications

  • using neural networks · CPC title

  • Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12561846B2 cover?
An apparatus and method for coding machine vision data using a reduction of feature map are disclosed. To reduce the size of a feature map extracted by a machine task-specialized deep learning model, a Video Coding for Machines (VCM) coding apparatus and a method are provided. The VCM coding apparatus and the method utilize a sparsification method that reduces redundancy in terms of space and c…
Who is the assignee on this patent?
Hyundai Motor Co Ltd, Kia Corp, Ewha University—Industry Collaboration Found
What technology area does this patent fall under?
Primary CPC classification G06T9/002. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 24 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).