Method, apparatus, and device for video frame interpolation
US-11354541-B2 · Jun 7, 2022 · US
US12185023B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12185023-B2 |
| Application number | US-202318206459-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 6, 2023 |
| Priority date | Jun 14, 2022 |
| Publication date | Dec 31, 2024 |
| Grant date | Dec 31, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for generating a video intermediate frame, including obtaining a target video frame pair; constructing an image pyramid for each video frame in the target video frame pair; and generating an intermediate frame of the target video frame pair by using a bidirectional optical flow estimation model and a pixel synthesis model in a layer-by-layer recursive calling manner according to an order of the image pyramid from a high layer to a low layer based on the image pyramid, wherein the generating of the intermediate frame of the target video frame pair comprising: repairing a bidirectional optical flow corresponding to a previous layer using the bidirectional optical flow estimation model, and repairing a previous intermediate frame corresponding to the previous layer using the pixel synthesis model.
Opening claim text (preview).
The invention claimed is: 1. A method for generating a video intermediate frame, comprising: obtaining a target video frame pair; constructing an image pyramid for each video frame in the target video frame pair; and generating an intermediate frame of the target video frame pair by using a bidirectional optical flow estimation model and a pixel synthesis model in a layer-by-layer recursive calling manner according to an order of the image pyramid from a high layer to a low layer based on the image pyramid, wherein the generating of the intermediate frame of the target video frame pair comprises: repairing a bidirectional optical flow corresponding to a previous layer using the bidirectional optical flow estimation model, and repairing a previous intermediate frame corresponding to the previous layer using the pixel synthesis model. 2. The method of claim 1 , wherein the generating of the intermediate frame of the target video frame pair further comprising: generating a first number of pixel-level feature maps having different resolutions for an image of a current layer in each image pyramid using a feature coding network, in order to provide the pixel-level feature maps to the bidirectional optical flow estimation model and the pixel synthesis model. 3. The method of claim 2 , wherein the first number is greater than or equal to 3, wherein the feature coding network comprises a convolutional network having at least a second number of down samplings, and wherein the second number is equal to the first number minus one. 4. The method of claim 1 , wherein the repairing of bidirectional optical flow corresponding to the previous layer comprising: inputting a pixel-level feature map corresponding to an image of a current layer and the bidirectional optical flow corresponding to the previous layer into the bidirectional optical flow estimation model, wherein the pixel-level feature map comprises a feature map output by convolution of a last layer of a feature coding network as a result of the image of the current layer being input to the feature coding network, and wherein the bidirectional optical flow comprises an optical flow from each video frame to the intermediate frame. 5. The method of claim 4 , wherein the repairing of the bidirectional optical flow comprising: linearly weighting the bidirectional optical flow corresponding to the previous layer to obtain an initial estimation value of a bidirectional optical flow corresponding to the current layer; based on the initial estimation value, performing forward-warping on the pixel-level feature map corresponding to each image of the current layer using a forward-warping layer of the bidirectional optical flow estimation model; based on a forward-warped feature map obtained by the forward-warping, constructing a partial cost volume using a cost volume layer of the bidirectional optical flow estimation model; performing channel stacking based on the initial estimation value, the forward-warped feature map, the partial cost volume, and a convolutional neural network (CNN) feature of the bidirectional optical flow corresponding to the previous layer; inputting a result of the channel stacking into an optical flow estimation layer of the bidirectional optical flow estimation model; and performing optical flow estimation to obtain a bidirectional optical flow repairing result corresponding to the current layer. 6. The method of claim 1 , wherein repairing the previous intermediate frame comprises: linearly weighting the repaired bidirectional optical flow; for each video frame, performing forward-warping for an image of a current layer in the video frame and a context feature of the image using a forward-warping layer of the pixel synthesis model based on the linearly weighted optical flow corresponding to the video frame, wherein the context feature includes a feature map output by a feature coding network before each down sampling and a feature map output by convolution of a last layer after the image of the current layer in the video frame is input to the feature coding network for processing; and inputting a result of the forward-warping and the previous intermediate frame to a pixel synthesis network of the pixel synthesis model to obtain an intermediate frame repairing result corresponding to the current layer. 7. The method of claim 1 , further comprising: after the intermediate frame is obtained based on an image of the lowest layer in the image pyramid, outputting the bidirectional optical flow. 8. The method of claim 2 , wherein the feature coding network is shared by the bidirectional optical flow estimation model and the pixel synthesis model. 9. The method of claim 1 , wherein the generated video intermediate frame is used for single-frame video frame interpolation or multi-frame video frame interpolation. 10. An apparatus for generating a video intermediate frame, comprising: at least one processor; and a memory configured to store instructions which, when executed by the at least one processor, cause the at least one processor to: obtain a target video frame pair; construct an image pyramid for each video frame in the target video frame pair; and generate an intermediate frame of the target video frame pair by using a bidirectional optical flow estimation model and a pixel synthesis model in a layer-by-layer recursive calling manner according to an order of the image pyramid from a high layer to a low layer based on pyramid, wherein the at least one processor configured, when generating the intermediate frame of the target video frame pair, to: repairing a bidirectional optical flow corresponding to a previous layer by using the bidirectional optical flow estimation model, and repairing a previous intermediate frame corresponding to the previous layer by using the pixel synthesis model. 11. The apparatus of claim 10 , wherein the at least one processor further configured, when generating the intermediate frame of the target video frame pair, to: generate a first number of pixel-level feature maps having different resolutions for an image of a current layer in each image pyramid using a feature coding network, in order to provide the pixel-level feature maps to the bidirectional optical flow estimation model and the pixel synthesis model. 12. The apparatus of claim 11 , wherein the first number is greater than or equal to 3, wherein the feature coding network comprises a convolutional network having at least a second number of down samplings, and wherein the second number is equal to the first number minus one. 13. The apparatus of claim 10 , wherein the at least one processor configured, when repairing bidirectional optical flow corresponding to the previous layer, to: input a pixel-level feature map corresponding to an image of a current layer and the bidirectional optical flow corresponding to the previous layer into the bidirectional optical flow estimation model, wherein the pixel-level feature map comprises a feature map output by convolution of a last layer of a feature coding network as a result of the image of the current layer being input to the feature coding network, and wherein the bidirectional optical flow comprises an optical flow from each video frame to the intermediate frame. 14. The apparatus of claim 13 , wherein the at least one processor further configured, when repairing the bidirectional optical flow, to: linearly weigh the bidirectional optical flow corresponding to the previous layer to obtain an initial estimation value of a bidirectional optical flow corresponding to the current layer; base
by changing the field or frame frequency of the incoming video signal, e.g. frame rate converter · CPC title
Artificial neural networks [ANN] · CPC title
Video; Image sequence · CPC title
using feature-based methods, e.g. the tracking of corners or segments · CPC title
Combinations of networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.