Methods for training and analysing input data using a machine learning model
US-2022284240-A1 · Sep 8, 2022 · US
US12548333B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12548333-B2 |
| Application number | US-202117566782-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 31, 2021 |
| Priority date | Dec 31, 2021 |
| Publication date | Feb 10, 2026 |
| Grant date | Feb 10, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A recognition network is trained for a selected video frame at a desired highest precision using back-propagation and a policy network is trained using back-propagation from the trained recognition network. The recognition network is trained at a lower precision specified by a policy recommended for the selected video frame by the trained policy network. A frame of a given video is inputted to the trained policy network for determination of a precision policy for processing the frame. Video inferencing is performed utilizing the trained policy network and the trained recognition network based on the precision policy.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: training a recognition network for a selected video frame at a desired highest precision using back-propagation by selecting appropriate quantization levels of weights of the recognition network determined by a policy network trained using back-propagation from the trained recognition network; training the policy network using the back-propagation from the trained recognition network; training the recognition network at a lower precision specified by a policy recommended for the selected video frame by the trained policy network by selecting appropriate quantization levels of the weights of the recognition network; inputting a frame of a given video to the trained policy network for determination of a precision policy for processing the frame; and performing video inferencing utilizing a corresponding quantization level of the weights of the trained recognition network selected using the trained policy network. 2 . The method of claim 1 , wherein the policy network is trained using standard back-propagation through Gumbel SoftMax sampling. 3 . The method of claim 1 , wherein the recognition network is trained using standard back-propagation through Gumbel SoftMax sampling. 4 . The method of claim 1 , further comprising skipping video frames corresponding to a precision policy of zero during the performance of the inferencing. 5 . The method of claim 1 , wherein the policy network is trained based on an overall loss g , where the loss g is defined as: g ( V )= ce ( V|A )+ kd ( V|A )+ w 1 e ( A )+ w 2 b ( A )+ w 3 d (π), where V is a given input video, A=g(V), π is a distribution, and w 1 , w 2 and w 3 are hyperparameters to balance loss terms. 6 . The method of claim 1 , wherein the recognition network is trained based on an overall loss f , where the loss f is defined as: ℒ f ( V ) = ∑ A = b 1 T , … b n T ℒ c e ( V ❘ "\[LeftBracketingBar]" A ) + ℒ kd ( V ❘ "\[LeftBracketingBar]" A ) , where V is a given input video, A=g(V), b is a bit-width, T is a count of video frames, and n is a count of candidate bit-widths. 7 . The method of claim 1 , wherein the training of the recognition network at the precision specified by the policy further comprises quantizing a full precision weight W of the recognition network to a largest bit-width b 1 and truncating a least significant b 1 -b bits to derive a quantized weight Ŵ b and the method further comprising aligning E[Ŵ b ] with E[Ŵ b1 ] to minimize a mean discrepancy caused by discarded bits. 8 . An apparatus comprising: a memory; and at least one processor, coupled to said memory, and operative to perform operations comprising: training a recognition network for a selected video frame at a highest precision using back-propagation by selecting appropriate quantization levels of weights of the recognition network determined by a policy network trained using back-propagation from the trained recognition network; training the policy network using the back-propagation from the recognition network; training the recognition network at a precision specified by a policy recommended for the selected video frame by the trained policy network by selecting appropriate quantization levels of the weights of the recognition network; inputting a frame of a given video to the policy network for determination of a precision policy for processing the frame; and performing video inferencing utilizing a corresponding quantization level of the weights of the trained recognition network selected using the trained policy network. 9 . The apparatus of claim 8 , wherein the policy network is trained using standard back-propagation through Gumbel SoftMax sampling. 10 . The apparatus of claim 8 , wherein the recognition network is trained using standard back-propagation through Gumbel SoftMax sampling. 11 . The apparatus of claim 8 , the operations further comprising skipping video frames corresponding to a precision policy of zero during the performance of the inferencing. 12 . The apparatus of claim 8 , wherein the policy network is trained based on an overall loss g , where the loss g is defined as: g ( V )= ce ( V|A )+ kd ( V|A )+ w 1 e ( A )+ w 2 b ( A )+ w 3 d (π), where V is a given input video, A=g(V), π is a distribution, and w 1 , w 2 and w 3 are hyperparameters to balance loss terms. 13 . The apparatus of claim 8 , wherein the recognition network is trained based on an overall loss f , where the loss f is defined as: ℒ f ( V ) = ∑ A = b 1 T , …
Backpropagation, e.g. using gradient descent · CPC title
using neural networks · CPC title
Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.