Context-aware synthesis for video frame interpolation
US-2020394752-A1 · Dec 17, 2020 · US
US11354541B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11354541-B2 |
| Application number | US-201916626409-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 7, 2019 |
| Priority date | Mar 1, 2019 |
| Publication date | Jun 7, 2022 |
| Grant date | Jun 7, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present specification discloses a method, apparatus, and device for video frame interpolation. The method of embodiment of the present specification comprises: acquiring a video frame training sample, wherein the video frame training sample includes an even number of consecutive video frames and a first key frame, and the first key frame is an intermediate frame of the even number of consecutive video frames; constructing a pyramid deep learning model, wherein each level of the pyramid deep learning model being used to generate intermediate frames of different resolutions has a plurality of convolutional neural network layers; inputting the even number of consecutive video frames to the pyramid deep learning model to generate a second key frame; modifying the pyramid deep learning model according to the second key frame and the first key frame to generate a modified pyramid deep learning model; inputting a plurality of video frames to be processed into the modified pyramid deep learning model to generate an intermediate frame of the plurality of video frames. The invention fully exploits the spatio-temporal domain information between multi-frame video frames, and adopts a pyramid refinement strategy to effectively estimate the motion information and the occlusion region, thereby greatly improving the quality of the intermediate frame.
Opening claim text (preview).
We claim: 1. A method for video frame interpolation, comprising: acquiring a video frame training sample, wherein the video frame training sample includes an even number of consecutive video frames and a first key frame, and the first key frame is an intermediate frame of the even number of consecutive video frames; constructing a pyramid deep learning model, wherein each level of the pyramid deep learning model being used to generate intermediate frames of different resolutions has a plurality of convolutional neural network layers; from a lower level to an upper level, the resolution is gradually increased, and video frame parameters of a lower level resolution are used for the calculation of the intermediate frame of a higher resolution; inputting the even number of consecutive video frames to the pyramid deep learning model to generate a second key frame; modifying the pyramid deep learning model according to the second key frame and the first key frame to generate a modified pyramid deep learning model; inputting a plurality of video frames to be processed into the modified pyramid deep learning model to generate an intermediate frame of the plurality of video frames. 2. The method according to claim 1 , the modifying the pyramid deep learning model according to the second key frame and the first key frame comprises: extracting a first characteristic parameter of the first key frame; extracting a second characteristic parameter of the second key frame; generating a difference result between the first key frame and the second key frame according to the first feature parameter and the second feature parameter; adjusting weight parameters of the pyramid deep learning model according to the difference result. 3. A method for video frame interpolation, comprising: acquiring a video frame training sample, wherein the video frame training sample includes an even number of consecutive video frames and a first key frame, and the first key frame is an intermediate frame of the even number of consecutive video frames; constructing a pyramid deep learning model, wherein each level of the pyramid deep learning model being used to generate intermediate frames of different resolutions has a plurality of convolutional neural network layers; inputting the even number of consecutive video frames to the pyramid deep learning model to generate a second key frame; modifying the pyramid deep learning model according to the second key frame and the first key frame to generate a modified pyramid deep learning model; inputting a plurality of video frames to be processed into the modified pyramid deep learning model to generate an intermediate frame of the plurality of video frames; wherein the inputting the even number of consecutive video frames to the pyramid deep learning model comprises: determining a first resolution of a video frame inputted to the first level of the pyramid deep learning model according to a preset rule; processing the even number of consecutive video frames according to the first resolution; inputting the processed even number of consecutive video frames to the first level of the pyramid deep learning model to generate an optical flow set and an occlusion mask set of the intermediate frame to each video frame of the processed even number of consecutive video frames; generating a calculated intermediate frame of the first level according to the optical flow set and the occlusion mask set; modifying parameters of the first level of the pyramid deep learning model according to the calculated intermediate frame of the first level and the real intermediate frame with the resolution of the first level. 4. A method for video frame interpolation, comprising: acquiring a video frame training sample, wherein the video frame training sample includes an even number of consecutive video frames and a first key frame, and the first key frame is an intermediate frame of the even number of consecutive video frames; constructing a pyramid deep learning model, wherein each level of the pyramid deep learning model being used to generate intermediate frames of different resolutions has a plurality of convolutional neural network layers; inputting the even number of consecutive video frames to the pyramid deep learning model to generate a second key frame; modifying the pyramid deep learning model according to the second key frame and the first key frame to generate a modified pyramid deep learning model; inputting a plurality of video frames to be processed into the modified pyramid deep learning model to generate an intermediate frame of the plurality of video frames; wherein the inputting the even number of consecutive video frames to the pyramid deep learning model comprises: determining a second resolution of the video frame inputted to the K-th level of the pyramid deep learning model according to a preset rule, wherein a resolution of the video frame inputted to the K-th level is higher than a resolution of a video frame inputted to the (K-1)th level, the resolution of the last inputted video frame of the pyramid deep learning model is the original resolution of the even number of consecutive video frames, and K is a natural number greater than or equal to 2; processing the even number of consecutive video frames according to the second resolution to generate a video frame inputted to the K-th level; interpolation of each optical stream in the optical flow set generated by the (K-1)th level by upsampling by 2 times to generate a first optical flow set; processing the video frame inputted to the K-th level by using each optical flow in the first optical flow set to generate a first warped image set; generating a residual flow set and a occlusion mask set of the K-th level according to the first optical flow set and the first warped image set; generating an optical flow set of the K-th level according to the first optical flow set and the residual flow set; generating a calculated intermediate frame of the K-th level according to the optical flow set of the K-th level and the occlusion mask set of the K-th level; modifying parameters of the first level to the K-th level of the pyramid deep learning model according to the calculated intermediate frame of the K-th level and the real intermediate frame with the resolution of the K-th level. 5. The method according to claim 4 , the generating a calculated intermediate frame of the K-th level according to the optical flow set of the K-th level and the occlusion mask set of the K-th level comprises: generating a second warped image set through warping the inputted video frames by optical flow set at the K-th level; generating a calculated intermediate frame of the K-th level according to the second warped image set and the occlusion mask set of the K-th level. 6. The method according to claim 5 , the generating a calculated intermediate frame of the K-th level according to the second warped image set and the occlusion mask set of the K-th level comprises: the calculated intermediate frame of the K-th level is calculated by the following formula: I t , k = ∑ i = 1 4 M k , i ⊗
based on interpolation, e.g. bilinear interpolation (image demosaicing G06T3/4015; edge-driven or edge-based scaling G06T3/403) · CPC title
characterised by the process organisation or structure, e.g. boosting cascade · CPC title
based on criteria of topology preservation, e.g. multidimensional scaling or self-organising maps · CPC title
Validation; Performance evaluation; Active pattern learning techniques · CPC title
Activation functions · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.