Dynamic network quantization for efficient video inference

US12548333B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12548333-B2
Application numberUS-202117566782-A
CountryUS
Kind codeB2
Filing dateDec 31, 2021
Priority dateDec 31, 2021
Publication dateFeb 10, 2026
Grant dateFeb 10, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A recognition network is trained for a selected video frame at a desired highest precision using back-propagation and a policy network is trained using back-propagation from the trained recognition network. The recognition network is trained at a lower precision specified by a policy recommended for the selected video frame by the trained policy network. A frame of a given video is inputted to the trained policy network for determination of a precision policy for processing the frame. Video inferencing is performed utilizing the trained policy network and the trained recognition network based on the precision policy.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: training a recognition network for a selected video frame at a desired highest precision using back-propagation by selecting appropriate quantization levels of weights of the recognition network determined by a policy network trained using back-propagation from the trained recognition network; training the policy network using the back-propagation from the trained recognition network; training the recognition network at a lower precision specified by a policy recommended for the selected video frame by the trained policy network by selecting appropriate quantization levels of the weights of the recognition network; inputting a frame of a given video to the trained policy network for determination of a precision policy for processing the frame; and performing video inferencing utilizing a corresponding quantization level of the weights of the trained recognition network selected using the trained policy network. 2 . The method of claim 1 , wherein the policy network is trained using standard back-propagation through Gumbel SoftMax sampling. 3 . The method of claim 1 , wherein the recognition network is trained using standard back-propagation through Gumbel SoftMax sampling. 4 . The method of claim 1 , further comprising skipping video frames corresponding to a precision policy of zero during the performance of the inferencing. 5 . The method of claim 1 , wherein the policy network is trained based on an overall loss g , where the loss g is defined as: g ( V )= ce ( V|A )+ kd ( V|A )+ w 1 e ( A )+ w 2 b ( A )+ w 3 d (π), where V is a given input video, A=g(V), π is a distribution, and w 1 , w 2 and w 3 are hyperparameters to balance loss terms. 6 . The method of claim 1 , wherein the recognition network is trained based on an overall loss f , where the loss f is defined as: ℒ f ( V ) = ∑ A = b 1 T , … ⁢ b n T ⁢ ℒ c ⁢ e ( V ⁢ ❘ "\[LeftBracketingBar]" A ) + ℒ kd ( V ⁢ ❘ "\[LeftBracketingBar]" A ) , where V is a given input video, A=g(V), b is a bit-width, T is a count of video frames, and n is a count of candidate bit-widths. 7 . The method of claim 1 , wherein the training of the recognition network at the precision specified by the policy further comprises quantizing a full precision weight W of the recognition network to a largest bit-width b 1 and truncating a least significant b 1 -b bits to derive a quantized weight Ŵ b and the method further comprising aligning E[Ŵ b ] with E[Ŵ b1 ] to minimize a mean discrepancy caused by discarded bits. 8 . An apparatus comprising: a memory; and at least one processor, coupled to said memory, and operative to perform operations comprising: training a recognition network for a selected video frame at a highest precision using back-propagation by selecting appropriate quantization levels of weights of the recognition network determined by a policy network trained using back-propagation from the trained recognition network; training the policy network using the back-propagation from the recognition network; training the recognition network at a precision specified by a policy recommended for the selected video frame by the trained policy network by selecting appropriate quantization levels of the weights of the recognition network; inputting a frame of a given video to the policy network for determination of a precision policy for processing the frame; and performing video inferencing utilizing a corresponding quantization level of the weights of the trained recognition network selected using the trained policy network. 9 . The apparatus of claim 8 , wherein the policy network is trained using standard back-propagation through Gumbel SoftMax sampling. 10 . The apparatus of claim 8 , wherein the recognition network is trained using standard back-propagation through Gumbel SoftMax sampling. 11 . The apparatus of claim 8 , the operations further comprising skipping video frames corresponding to a precision policy of zero during the performance of the inferencing. 12 . The apparatus of claim 8 , wherein the policy network is trained based on an overall loss g , where the loss g is defined as: g ( V )= ce ( V|A )+ kd ( V|A )+ w 1 e ( A )+ w 2 b ( A )+ w 3 d (π), where V is a given input video, A=g(V), π is a distribution, and w 1 , w 2 and w 3 are hyperparameters to balance loss terms. 13 . The apparatus of claim 8 , wherein the recognition network is trained based on an overall loss f , where the loss f is defined as: ℒ f ( V ) = ∑ A = b 1 T , …

Assignees

Inventors

Classifications

  • Backpropagation, e.g. using gradient descent · CPC title

  • using neural networks · CPC title

  • Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title

  • modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12548333B2 cover?
A recognition network is trained for a selected video frame at a desired highest precision using back-propagation and a policy network is trained using back-propagation from the trained recognition network. The recognition network is trained at a lower precision specified by a policy recommended for the selected video frame by the trained policy network. A frame of a given video is inputted to …
Who is the assignee on this patent?
IBM, Univ Boston
What technology area does this patent fall under?
Primary CPC classification G06V20/46. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 10 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).