Progressive compressed domain computer vision and deep learning systems
US-11025942-B2 · Jun 1, 2021 · US
US12367654B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12367654-B2 |
| Application number | US-202117795178-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 15, 2021 |
| Priority date | Apr 16, 2020 |
| Publication date | Jul 22, 2025 |
| Grant date | Jul 22, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Devices and techniques related to implementing patch based video coding for machines are discussed. Such patch based video coding includes detecting regions of interest in a frame of video, extracting the detected regions of interest to one or more atlases absent the frame at a resolution not less than the resolution of the regions of interest, and encoding the one or more atlases to a bitstream.
Opening claim text (preview).
What is claimed is: 1. A system, comprising: a memory to store at least a portion of input video; and processor circuitry coupled to the memory, the processor circuitry to: detect a plurality of regions of interest for a machine learning operation in a full frame of the input video; form one or more atlases comprising the regions of interest at a first resolution, wherein a first region of interest of the plurality of regions of interest is in a first atlas; generate metadata corresponding to the one or more atlases and indicative of a size and location of each of the regions of interest in the full frame of video; detect a second region of interest in a subsequent frame of the input video; resize the first atlas and add the second region of interest to the resized first atlas; and encode the one or more atlases and the metadata into one or more bitstreams, the one or more bitstreams absent a representation of the full frame of video at the first resolution or a resolution higher than the first resolution, wherein the processor circuitry to encode the one or more atlases and the metadata comprises the processor circuitry to encode the resized first atlas. 2. The system of claim 1 , the processor circuitry to: downscale the full frame of video to a downscaled frame having a second resolution less than the first resolution; and include the downscaled frame of video at the second resolution in the one or more atlases for encode into the one or more bitstreams. 3. The system of claim 1 , wherein the processor circuitry to encode a first region of interest of the plurality of regions of interest comprises the processor circuitry to perform scalable video encode based on the first region of interest and a corresponding region of the full frame of video at a resolution lower than the first resolution. 4. The system of claim 1 , wherein the metadata comprises, for a first region of interest of the plurality of regions of interest, a top left position of the first region in the full frame and a scaling factor. 5. The system of claim 1 , wherein the processor circuitry to detect a first region of interest of the plurality of regions of interest comprises the processor circuitry to: perform lookahead analysis to detect corresponding subsequent first regions of interest in a plurality of temporally subsequent frames relative to the full frame; and size the first region of interest to include the first region of interest and all subsequent first regions of interest. 6. The system of claim 1 , wherein the processor circuitry to detect a first region of interest of the plurality of regions of interest comprises the processor circuitry to: determine a detected region around an object in the first region of interest; and expand the detected region of the first region of interest to provide a buffer around the region. 7. The system of claim 1 , wherein a first region of interest of the plurality of regions of interest comprises a representation of a face, the processor circuitry to separate the first region of interest into a first atlas and encrypt a first bitstream corresponding to the first atlas. 8. At least one non-transitory machine-readable medium comprising a plurality of instructions that, in response to being executed on a computing device, cause the computing device to code video for machine learning by: detecting a plurality of regions of interest for a machine learning operation in a full frame of video; forming one or more atlases comprising the regions of interest at a first resolution, wherein a first region of interest of the plurality of regions of interest is in a first atlas; generating metadata corresponding to the one or more atlases and indicative of a size and location of each of the regions of interest in the full frame of video; detecting a second region of interest in a subsequent frame of the video; resizing the first atlas and adding the second region of interest to the resized first atlas; and encoding the one or more atlases and the metadata into one or more bitstreams, the one or more bitstreams absent a representation of the full frame of video at the first resolution or a resolution higher than the first resolution, wherein encoding the one or more atlases and the metadata comprises encoding the resized first atlas. 9. The non-transitory machine-readable medium of claim 8 , further comprising instructions that, in response to being executed on the computing device, cause the computing device to code video for machine learning by: downscaling the full frame of video to a downscaled frame having a second resolution less than the first resolution; and including the downscaled frame of video at the second resolution in the one or more atlases for encode into the one or more bitstreams, wherein encoding a first region of interest of the plurality of regions of interest comprises scalable video encoding based on the first region of interest and a corresponding region of the full frame of video at a resolution lower than the first resolution. 10. The non-transitory machine-readable medium of claim 8 , wherein detecting a first region of interest of the plurality of regions of interest comprises: performing lookahead analysis to detect corresponding subsequent first regions of interest in a plurality of temporally subsequent frames relative to the full frame; and sizing the first region of interest to include the first region of interest and all subsequent first regions of interest. 11. A system, comprising: a memory to store at least a portion of video; and processor circuitry coupled to the memory, the processor circuitry to: detect a plurality of first regions of interest for machine learning operations in a full frame of the video; form an atlas comprising the first regions of interest at a first resolution; generate metadata corresponding to the atlas and indicative of a size and location of each of the first regions of interest in the full frame; detect a second region of interest in a subsequent frame of the video; resize the atlas and add the second region of interest to form a resized atlas; and encode the resized atlas and the metadata into one or more bitstreams, the one or more bitstreams absent a representation of the full frame of video at the first resolution or a resolution higher than the first resolution. 12. The system of claim 11 , the processor circuitry to: downscale the full frame to a downscaled frame having a second resolution less than the first resolution; and include the downscaled frame in the atlas for encode into the one or more bitstreams. 13. The system of claim 11 , wherein the processor circuitry to encode at least one of the first regions of interest comprises the processor circuitry to perform scalable video encode based on the at least one of the first regions of interest and a corresponding region of the full frame at a resolution lower than the first resolution. 14. The system of claim 11 , the processor circuitry to: perform lookahead analysis to detect corresponding one or more subsequent first regions of interest in a plurality of temporally subsequent frames relative to the full frame; and size at least one of the first regions of interest to include all subsequent first regions of interest. 15. The system of claim 11 , wherein the processor circuitry to detect at least one of the first regions of interest comprises the processor circuitry to: determine a detected region around an object in the at least one of the first regions of interest; and expand the detected region of the at least one of the first regions
Scaling of whole images or parts thereof, e.g. expanding or contracting · CPC title
in video content (extracting overlay text G06V20/62; video retrieval G06F16/70; processing of video elementary streams in video servers H04N21/234; processing of video elementary streams in video clients H04N21/44) · CPC title
Detection; Localisation; Normalisation · CPC title
using neural networks · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.