Video coding using camera motion compensation and object motion compensation
US-2024013441-A1 · Jan 11, 2024 · US
US12598331B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12598331-B2 |
| Application number | US-202318137352-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 20, 2023 |
| Priority date | Feb 14, 2023 |
| Publication date | Apr 7, 2026 |
| Grant date | Apr 7, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method of transmitting video data. A sequence of video frames is received. A warp operation for a first frame and a reference frame of the sequence of video frames is determined, wherein the warp operation defines a transformation of the reference frame to give an approximation of the first frame. One or more regions of interest of the first frame are identified. Encoded image data from the image data of the one of more regions of interest of the first frame is generated using an image encoder. The warp operation and the encoded image data are transmitted.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: receiving a sequence of frames; determining a warp operation for a particular frame and a reference frame from the sequence of frames, wherein the warp operation defines a transformation of the reference frame to approximate the particular frame; generating, using an encoder, encoded image data of one or more regions of interest of the particular frame; generating a reconstructed frame using the encoded image data and the warp operation; generating a score that reflects a similarity between the reconstructed frame and the particular frame; determining that the score satisfies a threshold value; and in response to determining that the score satisfies the threshold value, transmitting the reconstructed frame. 2 . The method of claim 1 , in response to determining that the score does not satisfy the threshold value, the method comprises: generating, using an encoder, an encoded frame of the particular frame; transmitting the encoded frame. 3 . The method of claim 2 , further comprising dynamically transmitting, for each frame of the sequence of frames, either the reconstructed frame or the encoded frame of a respective frame. 4 . The method of claim 1 , wherein generating the score that represents a similarity between the reconstructed frame and the particular frame comprises generating the score that represents the similarity between the reconstructed frame and the particular frame using at least one of a structural similarity index metric or a video multimethod assessment fusion method. 5 . The method of claim 1 , wherein the one or more regions of interest comprise at least one of an eye region or a mouth region of a face in the particular frame. 6 . The method of claim 1 , wherein generating, using an encoder, encoded image data of one or more regions of interest of the particular frame further comprises: prior to using the encoder, generating another frame by replacing image data outside the regions of interest of the particular frame with default image data; and generating the encoded image data by encoding the other frame with the encoder. 7 . The method of claim 1 , further comprising estimating, using a neural network, the score of the particular frame without generating the reconstruction frame. 8 . A system comprising: one or more computers and one or more storage devices on which are stored instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving a sequence of video frames; determining a warp operation for a particular frame and a reference frame from the sequence of video frames, wherein the warp operation defines a transformation of the reference frame to approximate the particular frame; generating, using an encoder, encoded image data of one or more regions of interest of the particular frame; generating a reconstructed frame using the encoded image data and the warp operation; generating a score that reflects a similarity between the reconstructed frame and the particular frame; determining that the score satisfies a threshold value; and in response to determining that the score satisfies the threshold value, transmitting the reconstructed frame. 9 . The system of claim 8 , in response to determining that the score does not satisfy the threshold value, the operations comprise: generating, using an encoder, an encoded frame of the particular frame; transmitting the encoded frame. 10 . The system of claim 9 , further comprising dynamically transmitting, for each frame of the sequence of frames, either the reconstructed frame or the encoded frame of a respective frame. 11 . The system of claim 8 , wherein generating the score that represents a similarity between the reconstructed frame and the particular frame comprises generating the score that represents the similarity between the reconstructed frame and the particular frame using at least one of a structural similarity index metric or a video multimethod assessment fusion method. 12 . The system of claim 8 , wherein the one or more regions of interest comprise at least one of an eye region or a mouth region of a face in the particular frame. 13 . The system of claim 8 , wherein generating, using an encoder, encoded image data of one or more regions of interest of the particular frame further comprises: prior to using the encoder, generating another frame by replacing image data outside the regions of interest of the particular frame with default image data; and generating the encoded image data by encoding the other frame with the encoder. 14 . The system of claim 8 , further comprising estimating, using a neural network, the score of the particular frame without generating the reconstruction frame. 15 . One or more non-transitory computer storage media encoded with computer program instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: receiving a sequence of video frames; determining a warp operation for a particular frame and a reference frame from the sequence of video frames, wherein the warp operation defines a transformation of the reference frame to approximate the particular frame; generating, using an encoder, encoded image data of one or more regions of interest of the particular frame; generating a reconstructed frame using the encoded image data and the warp operation; generating a score that reflects a similarity between the reconstructed frame and the particular frame; determining that the score satisfies a threshold value; and in response to determining that the score satisfies the threshold value, transmitting the reconstructed frame. 16 . The one or more non-transitory computer storage media of claim 15 , in response to determining that the score does not satisfy the threshold value, the operations comprise: generating, using an encoder, an encoded frame of the particular frame; transmitting the encoded frame. 17 . The one or more non-transitory computer storage media of claim 16 , further comprising dynamically transmitting, for each frame of the sequence of frames, either the reconstructed frame or the encoded frame of a respective frame. 18 . The one or more non-transitory computer storage media of claim 15 , wherein generating the score that represents a similarity between the reconstructed frame and the particular frame comprises generating the score that represents the similarity between the reconstructed frame and the particular frame using at least one of a structural similarity index metric or a video multimethod assessment fusion method. 19 . The one or more non-transitory computer storage media of claim 15 , wherein the one or more regions of interest comprise at least one of an eye region or a mouth region of a face in the particular frame. 20 . The one or more non-transitory computer storage media of claim 15 , wherein generating, using an encoder, encoded image data of one or more regions of interest of the particular frame further comprises: prior to using the encoder, generating another frame by replacing image data outside the regions of interest of the particular frame with default image data; and generating the encoded image data by encoding the other frame with the encoder.
Backpropagation, e.g. using gradient descent · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation (H04N19/635, H04N19/86 take precedence) · CPC title
using transform coding · CPC title
characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation (H04N19/635 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.