Patch based video coding for machines

US12367654B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12367654-B2
Application numberUS-202117795178-A
CountryUS
Kind codeB2
Filing dateApr 15, 2021
Priority dateApr 16, 2020
Publication dateJul 22, 2025
Grant dateJul 22, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Devices and techniques related to implementing patch based video coding for machines are discussed. Such patch based video coding includes detecting regions of interest in a frame of video, extracting the detected regions of interest to one or more atlases absent the frame at a resolution not less than the resolution of the regions of interest, and encoding the one or more atlases to a bitstream.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a memory to store at least a portion of input video; and processor circuitry coupled to the memory, the processor circuitry to: detect a plurality of regions of interest for a machine learning operation in a full frame of the input video; form one or more atlases comprising the regions of interest at a first resolution, wherein a first region of interest of the plurality of regions of interest is in a first atlas; generate metadata corresponding to the one or more atlases and indicative of a size and location of each of the regions of interest in the full frame of video; detect a second region of interest in a subsequent frame of the input video; resize the first atlas and add the second region of interest to the resized first atlas; and encode the one or more atlases and the metadata into one or more bitstreams, the one or more bitstreams absent a representation of the full frame of video at the first resolution or a resolution higher than the first resolution, wherein the processor circuitry to encode the one or more atlases and the metadata comprises the processor circuitry to encode the resized first atlas. 2. The system of claim 1 , the processor circuitry to: downscale the full frame of video to a downscaled frame having a second resolution less than the first resolution; and include the downscaled frame of video at the second resolution in the one or more atlases for encode into the one or more bitstreams. 3. The system of claim 1 , wherein the processor circuitry to encode a first region of interest of the plurality of regions of interest comprises the processor circuitry to perform scalable video encode based on the first region of interest and a corresponding region of the full frame of video at a resolution lower than the first resolution. 4. The system of claim 1 , wherein the metadata comprises, for a first region of interest of the plurality of regions of interest, a top left position of the first region in the full frame and a scaling factor. 5. The system of claim 1 , wherein the processor circuitry to detect a first region of interest of the plurality of regions of interest comprises the processor circuitry to: perform lookahead analysis to detect corresponding subsequent first regions of interest in a plurality of temporally subsequent frames relative to the full frame; and size the first region of interest to include the first region of interest and all subsequent first regions of interest. 6. The system of claim 1 , wherein the processor circuitry to detect a first region of interest of the plurality of regions of interest comprises the processor circuitry to: determine a detected region around an object in the first region of interest; and expand the detected region of the first region of interest to provide a buffer around the region. 7. The system of claim 1 , wherein a first region of interest of the plurality of regions of interest comprises a representation of a face, the processor circuitry to separate the first region of interest into a first atlas and encrypt a first bitstream corresponding to the first atlas. 8. At least one non-transitory machine-readable medium comprising a plurality of instructions that, in response to being executed on a computing device, cause the computing device to code video for machine learning by: detecting a plurality of regions of interest for a machine learning operation in a full frame of video; forming one or more atlases comprising the regions of interest at a first resolution, wherein a first region of interest of the plurality of regions of interest is in a first atlas; generating metadata corresponding to the one or more atlases and indicative of a size and location of each of the regions of interest in the full frame of video; detecting a second region of interest in a subsequent frame of the video; resizing the first atlas and adding the second region of interest to the resized first atlas; and encoding the one or more atlases and the metadata into one or more bitstreams, the one or more bitstreams absent a representation of the full frame of video at the first resolution or a resolution higher than the first resolution, wherein encoding the one or more atlases and the metadata comprises encoding the resized first atlas. 9. The non-transitory machine-readable medium of claim 8 , further comprising instructions that, in response to being executed on the computing device, cause the computing device to code video for machine learning by: downscaling the full frame of video to a downscaled frame having a second resolution less than the first resolution; and including the downscaled frame of video at the second resolution in the one or more atlases for encode into the one or more bitstreams, wherein encoding a first region of interest of the plurality of regions of interest comprises scalable video encoding based on the first region of interest and a corresponding region of the full frame of video at a resolution lower than the first resolution. 10. The non-transitory machine-readable medium of claim 8 , wherein detecting a first region of interest of the plurality of regions of interest comprises: performing lookahead analysis to detect corresponding subsequent first regions of interest in a plurality of temporally subsequent frames relative to the full frame; and sizing the first region of interest to include the first region of interest and all subsequent first regions of interest. 11. A system, comprising: a memory to store at least a portion of video; and processor circuitry coupled to the memory, the processor circuitry to: detect a plurality of first regions of interest for machine learning operations in a full frame of the video; form an atlas comprising the first regions of interest at a first resolution; generate metadata corresponding to the atlas and indicative of a size and location of each of the first regions of interest in the full frame; detect a second region of interest in a subsequent frame of the video; resize the atlas and add the second region of interest to form a resized atlas; and encode the resized atlas and the metadata into one or more bitstreams, the one or more bitstreams absent a representation of the full frame of video at the first resolution or a resolution higher than the first resolution. 12. The system of claim 11 , the processor circuitry to: downscale the full frame to a downscaled frame having a second resolution less than the first resolution; and include the downscaled frame in the atlas for encode into the one or more bitstreams. 13. The system of claim 11 , wherein the processor circuitry to encode at least one of the first regions of interest comprises the processor circuitry to perform scalable video encode based on the at least one of the first regions of interest and a corresponding region of the full frame at a resolution lower than the first resolution. 14. The system of claim 11 , the processor circuitry to: perform lookahead analysis to detect corresponding one or more subsequent first regions of interest in a plurality of temporally subsequent frames relative to the full frame; and size at least one of the first regions of interest to include all subsequent first regions of interest. 15. The system of claim 11 , wherein the processor circuitry to detect at least one of the first regions of interest comprises the processor circuitry to: determine a detected region around an object in the at least one of the first regions of interest; and expand the detected region of the at least one of the first regions

Assignees

Inventors

Classifications

  • Scaling of whole images or parts thereof, e.g. expanding or contracting · CPC title

  • in video content (extracting overlay text G06V20/62; video retrieval G06F16/70; processing of video elementary streams in video servers H04N21/234; processing of video elementary streams in video clients H04N21/44) · CPC title

  • Detection; Localisation; Normalisation · CPC title

  • using neural networks · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12367654B2 cover?
Devices and techniques related to implementing patch based video coding for machines are discussed. Such patch based video coding includes detecting regions of interest in a frame of video, extracting the detected regions of interest to one or more atlases absent the frame at a resolution not less than the resolution of the regions of interest, and encoding the one or more atlases to a bitstream.
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification H04N19/33. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Jul 22 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).