Hybrid neural network based end-to-end image and video coding method
US-2023096567-A1 · Mar 30, 2023 · US
US12526430B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12526430-B2 |
| Application number | US-202318235769-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 18, 2023 |
| Priority date | Aug 18, 2023 |
| Publication date | Jan 13, 2026 |
| Grant date | Jan 13, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method for optimizing encoding video frames from a video is provided. In an embodiment, the method comprises receiving a video frame to be encoded. The method further comprises using one or more machine learning models to generate an encoding parameter value for encoding the video frame. The method further comprises comparing a first set of delta encoding values, based on the encoding parameter value, representing differences between groups of pixels of the video frame to a second set of delta encoding values, based on an alternative encoding parameter value, representing differences between the groups of pixels of the video frame, and in response to determining that the first set of delta encoding values is less than the second set of delta encoding values, selecting the encoding parameter value. The method further comprises based on the encoding parameter value, encoding the video frame to generate an encoded video frame.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method for optimizing encoding of video frames from a video, the method comprising: receiving a video frame to be encoded; using one or more machine learning models: generating an encoding parameter value for encoding the video frame; comparing a first set of delta encoding values, based on the encoding parameter value, representing differences between groups of pixels of the video frame to a second set of delta encoding values, based on an alternative encoding parameter value, representing differences between the groups of pixels of the video frame; in response to determining that the first set of delta encoding values is less than the second set of delta encoding values, selecting the encoding parameter value; wherein the one or more machine learning models are trained using a reference encoder; and based on the encoding parameter value, encoding the video frame to generate an encoded video frame. 2 . The computer-implemented method of claim 1 , wherein comparing the first set of delta encoding values to the second set of delta encoding values, comprises: generating the first set of delta encoding values based on the encoding parameter value; selecting the alternative encoding parameter value for encoding the video frame; generating the second set of delta encoding values based on the alternative encoding parameter value; and comparing the first set of delta encoding values to the second set of delta encoding values. 3 . The computer-implemented method of claim 1 , wherein generating the encoding parameter value for encoding the video frame, comprises: using the one or more machine learning models to generate a set of candidate encoding parameter values; wherein each candidate encoding parameter value of the set of candidate encoding parameter values represents one type of encoding method, wherein types of encoding methods comprise intra-block encoding, inter-block encoding, intra-block-copy encoding, and palette encoding; for each candidate encoding parameter value from the set of candidate encoding value, generating sets of delta encoding values; determining, from the sets of delta encoding values, a particular set of delta encoding values with lowest delta encoding values; and selecting the encoding parameter value that corresponds to the particular set of delta encoding values. 4 . The computer-implemented method of claim 1 , wherein the one or more machine learning models are convolutional neural network models. 5 . The computer-implemented method of claim 1 , wherein the one or more machine learning models are transformer neural network models. 6 . The computer-implemented method of claim 1 , wherein the encoding parameter value is a parameter value for at least one of a block prediction and block transformation. 7 . The computer-implemented method of claim 1 , wherein the one or more machine learning models are further trained training using bit-string lengths. 8 . A non-transitory, computer-readable medium storing a set of instructions for optimizing encoding of video frames, that, when executed by a processor, cause: receiving a video frame to be encoded; using one or more machine learning models: generating an encoding parameter value for encoding the video frame; comparing a first set of delta encoding values, based on the encoding parameter value, representing differences between groups of pixels of the video frame to a second set of delta encoding values, based on an alternative encoding parameter value, representing differences between the groups of pixels of the video frame; in response to determining that the first set of delta encoding values is less than the second set of delta encoding values, selecting the encoding parameter value; wherein the one or more machine learning models are trained using a reference encoder; and based on the encoding parameter value, encoding the video frame to generate an encoded video frame. 9 . The non-transitory, computer-readable medium of claim 8 , wherein comparing the first set of delta encoding values to the second set of delta encoding values, comprises: generating the first set of delta encoding values based on the encoding parameter value; selecting the alternative encoding parameter value for encoding the video frame; generating the second set of delta encoding values based on the alternative encoding parameter value; and comparing the first set of delta encoding values to the second set of delta encoding values. 10 . The non-transitory, computer-readable medium of claim 8 , wherein generating the encoding parameter value for encoding the video frame, comprises: using the one or more machine learning models to generate a set of candidate encoding parameter values; wherein each candidate encoding parameter value of the set of candidate encoding parameter values represents one type of encoding method, wherein types of encoding methods comprise intra-block encoding, inter-block encoding, intra-block-copy encoding, and palette encoding; for each candidate encoding parameter value from the set of candidate encoding value, generating sets of delta encoding values; determining, from the sets of delta encoding values, a particular set of delta encoding values with lowest delta encoding values; and selecting the encoding parameter value that corresponds to the particular set of delta encoding values. 11 . The non-transitory, computer-readable medium of claim 8 , wherein the one or more machine learning models are convolutional neural network models. 12 . The non-transitory, computer-readable medium of claim 8 , wherein the one or more machine learning models are transformer neural network models. 13 . The non-transitory, computer-readable medium of claim 8 , wherein the encoding parameter value is a parameter value for at least one of a block prediction and block transformation. 14 . The non-transitory, computer-readable medium of claim 8 , wherein the one or more machine learning models are further trained using bit-string lengths. 15 . A network-based system for optimizing encoding of video frames, the system comprising: a processor; a memory operatively connected to the processor and storing instructions that, when executed by the processor, cause: receiving a video frame to be encoded; using one or more machine learning models: generating an encoding parameter value for encoding the video frame; comparing a first set of delta encoding values, based on the encoding parameter value, representing differences between groups of pixels of the video frame to a second set of delta encoding values, based on an alternative encoding parameter value, representing differences between the groups of pixels of the video frame; in response to determining that the first set of delta encoding values is less than the second set of delta encoding values, selecting the encoding parameter value; wherein the one or more machine learning models are trained using a reference encoder; and based on the encoding parameter value, encoding the video frame to generate an encoded video frame. 16 . The system of claim 15 , wherein comparing the first set of delta encoding values to the second set of delta encoding values, comprises: generating the first set of delta encoding values based on the encoding parameter value; selecting the alternative encoding parameter value for encoding the video frame; generating the second set of delta encoding values based on the alternative encoding parameter value; and comparing the first set of delta encoding v
Incoming video signal characteristics or properties · CPC title
the region being a picture, frame or field · CPC title
Data rate or code amount at the encoder output · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.