Reinforcement learning method for video encoder

US2020344472A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2020344472-A1
Application numberUS-202016855190-A
CountryUS
Kind codeA1
Filing dateApr 22, 2020
Priority dateApr 23, 2019
Publication dateOct 29, 2020
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A reinforcement learning method for frame-level bit allocation is disclosed. The reinforcement learning method includes steps of: (a) at a testing time, computing a state according to a plurality of features; (b) determining an action according to a policy; (c) determining a number of bits allocated to an i-th frame in a group of pictures (GOP) according to the action, a GOP-level bit budget and the state, wherein i is a positive integer; (d) encoding the i-th frame according to the number of bits allocated to the i-th frame in the GOP; and (e) repeating the steps (a)˜(d) until an end of the GOP.

First claim

Opening claim text (preview).

What is claimed is: 1 . A reinforcement learning method for a video encoder, comprising steps of: (a) at a testing time, computing a state according to a plurality of features; (b) determining an action according to a policy; (c) determining a number of bits allocated to an i-th frame in a group of pictures (GOP) according to the action, a GOP-level bit budget and the state, wherein i is a positive integer; (d) encoding the i-th frame according to the number of bits allocated to the i-th frame in the GOP; and (e) repeating the steps (a)˜(d) until an end of the GOP. 2 . The reinforcement learning method of claim 1 , wherein the reinforcement learning method is used for frame-level bit allocation or intra-frame bit allocation; the reinforcement learning method is applied to a reinforcement learning system comprising an agent and an environment to allocate appropriate bits to each frame in the GOP, so that a GOP-level distortion is minimized subject to a GOP-level bit budget. 3 . The reinforcement learning method of claim 2 , wherein at a time step, the agent is configured to observe the state from the environment and take the action according to the policy. 4 . The reinforcement learning method of claim 2 , wherein the policy describes a behavior of the agent and the policy is considered a stochastic mapping from the state to the action to define a distribution over the action conditioned on the state. 5 . The reinforcement learning method of claim 3 , wherein upon taking the action, the agent receives an immediate reward and a new state from the environment, and dynamics of the environment is defined by a transition distribution. 6 . The reinforcement learning method of claim 1 , wherein the agent is a frame-adaptive bit allocation algorithm and the environment is an encoder for encoding the i-th frame to match the number of bits allocated to the i-th frame in the GOP. 7 . The reinforcement learning method of claim 2 , wherein the action is a real number between 0 and 1 specifying a ratio of the number of bits allocated to the i-th frame in the GOP to the GOP-level bit budget. 8 . The reinforcement learning method of claim 5 , wherein after the i-th frame is encoded, the immediate reward is computed to be a negative mean squared error of the i-th frame due to compression. 9 . The reinforcement learning method of claim 1 , wherein the plurality of features comprises an intra-frame feature (mean and variance of pixel values), an inter-frame feature (mean and variance of residuals), an average of intra-frame features over remaining frames, an average of inter-frame features over the remaining frames, a percentage of remaining bits, a temporal identification of a current frame, a number of the remaining frames in the GOP and bits per pixel (a bit rate/a frame rate). 10 . The reinforcement learning method of claim 1 , wherein an interaction between the agent and the environment is ended in a terminal state. 11 . The reinforcement learning method of claim 10 , wherein the terminal state corresponding to underflow of the GOP-level bit budget is that when all frames in the GOP are successfully encoded, there are still leftover bits. 12 . The reinforcement learning method of claim 11 , wherein an immediate reward for the frames in the GOP is penalized by a value proportional to a percentage of the leftover bits. 13 . The reinforcement learning method of claim 10 , wherein the terminal state corresponding to overflow is that all bits are run out, but remaining frames in the GOP are not encoded. 14 . The reinforcement learning method of claim 13 , wherein an immediate reward for a last encoded frame in the GOP is penalized by a large negative value proportional to a number of the remaining frames. 15 . The reinforcement learning method of claim 1 , wherein each frame in the GOP is characterized with an intra-frame feature and an inter-frame feature; the intra-frame feature and the inter-frame feature of the frame are computed before the frame is encoded. 16 . The reinforcement learning method of claim 15 , wherein the intra-frame feature summarizes a texture complexity of the frame in terms of mean and variance of pixel values of the frame, while the inter-frame feature collects the texture complexity of the frame from mean and variance of prediction residuals of the frame. 17 . The reinforcement learning method of claim 16 , wherein the prediction residuals of the frame are approximated by forming a zero-motion prediction of the frame in question from reference frames. 18 . The reinforcement learning method of claim 17 , wherein the zero-motion prediction is uni-prediction or bi-prediction. 19 . The reinforcement learning method of claim 1 , wherein when the agent is trained with Deep Deterministic Policy Gradient (DDPG) algorithm in a continuous action space, the agent comprises an actor and a critic implemented with two dedicated neural networks; the actor is configured to determine the action and the critic is configured to evaluate a value of the action taken by the actor in the state. 20 . The reinforcement learning method of claim 19 , wherein at a training time, the critic is learned by minimizing a loss between a predicted immediate reward and an actual immediate reward, while the actor is updated by using a policy gradient to maximize the value evaluated by the critic; at the testing time, the actor plays a role of the agent.

Assignees

Inventors

Classifications

  • based on feedback from supervisors · CPC title

  • using neural networks · CPC title

  • H04N19/114Primary

    Adapting the group of pictures [GOP] structure, e.g. number of B-frames between two anchor frames (H04N19/107 takes precedence) · CPC title

  • Validation; Performance evaluation; Active pattern learning techniques · CPC title

  • Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020344472A1 cover?
A reinforcement learning method for frame-level bit allocation is disclosed. The reinforcement learning method includes steps of: (a) at a testing time, computing a state according to a plurality of features; (b) determining an action according to a policy; (c) determining a number of bits allocated to an i-th frame in a group of pictures (GOP) according to the action, a GOP-level bit budget an…
Who is the assignee on this patent?
Univ National Chiao Tung
What technology area does this patent fall under?
Primary CPC classification G06V10/7784. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Oct 29 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).