What technology area does this patent fall under?

Primary CPC classification H04N19/33. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Jul 22 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Patch based video coding for machines

US12367654B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12367654-B2
Application number	US-202117795178-A
Country	US
Kind code	B2
Filing date	Apr 15, 2021
Priority date	Apr 16, 2020
Publication date	Jul 22, 2025
Grant date	Jul 22, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Devices and techniques related to implementing patch based video coding for machines are discussed. Such patch based video coding includes detecting regions of interest in a frame of video, extracting the detected regions of interest to one or more atlases absent the frame at a resolution not less than the resolution of the regions of interest, and encoding the one or more atlases to a bitstream.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a memory to store at least a portion of input video; and processor circuitry coupled to the memory, the processor circuitry to: detect a plurality of regions of interest for a machine learning operation in a full frame of the input video; form one or more atlases comprising the regions of interest at a first resolution, wherein a first region of interest of the plurality of regions of interest is in a first atlas; generate metadata corresponding to the one or more atlases and indicative of a size and location of each of the regions of interest in the full frame of video; detect a second region of interest in a subsequent frame of the input video; resize the first atlas and add the second region of interest to the resized first atlas; and encode the one or more atlases and the metadata into one or more bitstreams, the one or more bitstreams absent a representation of the full frame of video at the first resolution or a resolution higher than the first resolution, wherein the processor circuitry to encode the one or more atlases and the metadata comprises the processor circuitry to encode the resized first atlas. 2. The system of claim 1 , the processor circuitry to: downscale the full frame of video to a downscaled frame having a second resolution less than the first resolution; and include the downscaled frame of video at the second resolution in the one or more atlases for encode into the one or more bitstreams. 3. The system of claim 1 , wherein the processor circuitry to encode a first region of interest of the plurality of regions of interest comprises the processor circuitry to perform scalable video encode based on the first region of interest and a corresponding region of the full frame of video at a resolution lower than the first resolution. 4. The system of claim 1 , wherein the metadata comprises, for a first region of interest of the plurality of regions of interest, a top left position of the first region in the full frame and a scaling factor. 5. The system of claim 1 , wherein the processor circuitry to detect a first region of interest of the plurality of regions of interest comprises the processor circuitry to: perform lookahead analysis to detect corresponding subsequent first regions of interest in a plurality of temporally subsequent frames relative to the full frame; and size the first region of interest to include the first region of interest and all subsequent first regions of interest. 6. The system of claim 1 , wherein the processor circuitry to detect a first region of interest of the plurality of regions of interest comprises the processor circuitry to: determine a detected region around an object in the first region of interest; and expand the detected region of the first region of interest to provide a buffer around the region. 7. The system of claim 1 , wherein a first region of interest of the plurality of regions of interest comprises a representation of a face, the processor circuitry to separate the first region of interest into a first atlas and encrypt a first bitstream corresponding to the first atlas. 8. At least one non-transitory machine-readable medium comprising a plurality of instructions that, in response to being executed on a computing device, cause the computing device to code video for machine learning by: detecting a plurality of regions of interest for a machine learning operation in a full frame of video; forming one or more atlases comprising the regions of interest at a first resolution, wherein a first region of interest of the plurality of regions of interest is in a first atlas; generating metadata corresponding to the one or more atlases and indicative of a size and location of each of the regions of interest in the full frame of video; detecting a second region of interest in a subsequent frame of the video; resizing the first atlas and adding the second region of interest to the resized first atlas; and encoding the one or more atlases and the metadata into one or more bitstreams, the one or more bitstreams absent a representation of the full frame of video at the first resolution or a resolution higher than the first resolution, wherein encoding the one or more atlases and the metadata comprises encoding the resized first atlas. 9. The non-transitory machine-readable medium of claim 8 , further comprising instructions that, in response to being executed on the computing device, cause the computing device to code video for machine learning by: downscaling the full frame of video to a downscaled frame having a second resolution less than the first resolution; and including the downscaled frame of video at the second resolution in the one or more atlases for encode into the one or more bitstreams, wherein encoding a first region of interest of the plurality of regions of interest comprises scalable video encoding based on the first region of interest and a corresponding region of the full frame of video at a resolution lower than the first resolution. 10. The non-transitory machine-readable medium of claim 8 , wherein detecting a first region of interest of the plurality of regions of interest comprises: performing lookahead analysis to detect corresponding subsequent first regions of interest in a plurality of temporally subsequent frames relative to the full frame; and sizing the first region of interest to include the first region of interest and all subsequent first regions of interest. 11. A system, comprising: a memory to store at least a portion of video; and processor circuitry coupled to the memory, the processor circuitry to: detect a plurality of first regions of interest for machine learning operations in a full frame of the video; form an atlas comprising the first regions of interest at a first resolution; generate metadata corresponding to the atlas and indicative of a size and location of each of the first regions of interest in the full frame; detect a second region of interest in a subsequent frame of the video; resize the atlas and add the second region of interest to form a resized atlas; and encode the resized atlas and the metadata into one or more bitstreams, the one or more bitstreams absent a representation of the full frame of video at the first resolution or a resolution higher than the first resolution. 12. The system of claim 11 , the processor circuitry to: downscale the full frame to a downscaled frame having a second resolution less than the first resolution; and include the downscaled frame in the atlas for encode into the one or more bitstreams. 13. The system of claim 11 , wherein the processor circuitry to encode at least one of the first regions of interest comprises the processor circuitry to perform scalable video encode based on the at least one of the first regions of interest and a corresponding region of the full frame at a resolution lower than the first resolution. 14. The system of claim 11 , the processor circuitry to: perform lookahead analysis to detect corresponding one or more subsequent first regions of interest in a plurality of temporally subsequent frames relative to the full frame; and size at least one of the first regions of interest to include all subsequent first regions of interest. 15. The system of claim 11 , wherein the processor circuitry to detect at least one of the first regions of interest comprises the processor circuitry to: determine a detected region around an object in the at least one of the first regions of interest; and expand the detected region of the at least one of the first regions

Assignees

Intel Corp

Inventors

Classifications

G06T3/40
Scaling of whole images or parts thereof, e.g. expanding or contracting · CPC title
G06V20/40
in video content (extracting overlay text G06V20/62; video retrieval G06F16/70; processing of video elementary streams in video servers H04N21/234; processing of video elementary streams in video clients H04N21/44) · CPC title
G06V40/161
Detection; Localisation; Normalisation · CPC title
G06T9/002
using neural networks · CPC title
G06N20/00
Machine learning · CPC title

Patent family

Related publications grouped by family.

View patent family 78084221

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12367654B2 cover?: Devices and techniques related to implementing patch based video coding for machines are discussed. Such patch based video coding includes detecting regions of interest in a frame of video, extracting the detected regions of interest to one or more atlases absent the frame at a resolution not less than the resolution of the regions of interest, and encoding the one or more atlases to a bitstream.
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification H04N19/33. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Jul 22 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Progressive compressed domain computer vision and deep learning systems

System and method for training object classifier by machine learning

Point cloud compression with multi-resolution video encoding

Progressive compressed domain computer vision and deep learning systems

Object classification using machine learning and object tracking

Enhanced siamese trackers

Frequently asked questions