System and method for optical flow estimation
US-10424069-B2 · Sep 24, 2019 · US
US11475536B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11475536-B2 |
| Application number | US-201916971478-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 22, 2019 |
| Priority date | Feb 27, 2018 |
| Publication date | Oct 18, 2022 |
| Grant date | Oct 18, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems, methods, and computer-readable media for context-aware synthesis for video frame interpolation are provided. Bidirectional flow may be used in combination with flexible frame synthesis neural network to handle occlusions and the like, and to accommodate inaccuracies in motion estimation. Contextual information may be used to enable frame synthesis neural network to perform informative interpolation. Optical flow may be used to provide initialization for interpolation. Other embodiments may be described and/or claimed.
Opening claim text (preview).
The invention claimed is: 1. A computer system comprising: processor circuitry coupled with memory circuitry, the memory circuitry is arranged to store program code of flow estimation logic, context extraction logic, warping logic, and frame synthesis neural network (FSNN) logic, and the processor circuitry is arranged to: operate the flow estimation logic to estimate a bidirectional optical flow between at least two input frames; operate the context extraction logic to extract context maps based on the estimated bidirectional optical flow; operate the warp logic to pre-warp the at least two input frames and corresponding context maps of the at least two input frames; operate the warp logic to feed the pre-warped frames and the corresponding context maps into an FSNN of the FSNN logic; and operate the FSNN logic to generate an output frame at a desired temporal position based on the pre-warped frames. 2. The computer system of claim 1 , wherein, to extract the context maps, the processor circuitry is arranged to operate the context extraction logic to extract per-pixel context information from the input frames as the context maps. 3. The computer system of claim 2 , wherein, to pre-warp the at least two input frames, the processor circuitry is arranged to operate the warping logic to use the bidirectional optical flow as a guide for the pre-warping of the input frames. 4. The computer system of claim 1 , wherein the processor circuitry is arranged to operate the flow estimation logic to generate an intermediate frame at a temporal position in between the at least two input frames. 5. The computer system of claim 1 , wherein the processor circuitry is arranged to operate the flow estimation logic to estimate the bidirectional optical flow using a Pyramidal processing, Warping, and Cost volume-Network (PWC-Net) mechanism. 6. The computer system of claim 1 , wherein the processor circuitry is arranged to operate the warping logic to use forward warping, wherein the estimated bidirectional optical flow is used to warp each of the at least two input frames to obtain corresponding pre-warped frames. 7. The computer system of claim 1 , wherein the processor circuitry is arranged to operate the FSNN logic to generate the output frame without performing pixel-wise blending. 8. The computer system of claim 1 , wherein the processor circuitry is to operate the context extraction logic to extract contextual information using a response of a convolutional layer of an 18 layer residual network (ResNet-18). 9. The computer system of claim 8 , wherein the FSNN comprises an extended grid network (GridNet), wherein the GridNet comprises a grid of one or more rows and one or more columns, wherein each row and each column comprise one or more Parametric Rectified Linear Units (PReLUs) and one or more convolution layers, wherein each convolution layer is disposed between the PReLUs. 10. The computer system of claim 1 , wherein the processor circuitry is arranged to operate the FSNN logic to measure a difference between the output frame and a ground truth frame during a training period, and wherein the ground truth frame comprises a center frame of a set of frames from among a plurality of frame sets of a training dataset. 11. A computer-implemented method comprising: estimating a bidirectional optical flow between at least two input frames; extracting context maps based on the estimated bidirectional optical flow, wherein the context maps comprise per-pixel context information from the at least two input frames; warping the at least two input frames and corresponding context maps of the at least two input frames, wherein the warping comprises using the bidirectional optical flow as a guide for the warping; feeding the warped frames and the corresponding context maps into a frame synthesis neural network (FSNN); and operating the FSNN to generate an output frame at a desired temporal position based on the warped frames. 12. The method of claim 11 , wherein the method comprises: estimating the bidirectional optical flow using a Pyramidal processing, Warping, and Cost volume-Network (PWC-Net) mechanism. 13. The method of claim 11 , wherein the method comprises: operating the FSNN to generate the output frame without performing pixel-wise blending. 14. The method of claim 11 , wherein the method comprises: generating an intermediate frame at the temporal position in between the at least two input frames. 15. The method of claim 11 , wherein the method comprises: operating the FSNN to measure a difference between the output frame and a ground truth frame during a training period, wherein the ground truth frame comprises a center frame of a set of frames from among a plurality of frame sets of a training dataset. 16. One or more non-transitory computer-readable media (NTCRM) comprising instructions, wherein execution of the instructions by one or more processors of a computing system is operable to cause the computing system to: estimate a bidirectional optical flow between at least two input frames; extract context maps based on the estimated bidirectional optical flow; warp the at least two input frames and corresponding context maps of the at least two input frames; feed the warped frames and the corresponding context maps into a frame synthesis neural network (FSNN); and operate the FSNN to generate an output frame at a desired temporal position based on the warped frames. 17. The one or more NTCRM of claim 16 , wherein execution of the instructions is further operable to cause the computing system to: extract per-pixel context information from the input frames as the context maps; and use the bidirectional optical flow as a guide to warp the input frames. 18. The one or more NTCRM of claim 17 , wherein execution of the instructions is further operable to cause the computing system to: generate an intermediate frame at a temporal position in between the at least two input frames. 19. The one or more NTCRM of claim 16 , wherein execution of the instructions is further operable to cause the computing system to: estimate the bidirectional optical flow using a Pyramidal processing, Warping, and Cost volume-Network (PWC-Net) mechanism. 20. The one or more NTCRM of claim 16 , wherein, to warp the at least two input frames, execution of the instructions is further operable to cause the computing system to: perform forward warping on the at least two input frames, wherein, to perform forward warping, execution of the instructions is further operable to cause the computing system to: use the estimated bidirectional optical flow to warp each of the at least two input frames to obtain corresponding warped frames. 21. The one or more NTCRM of claim 16 , wherein, to operate the FSNN, execution of the instructions is further operable to cause the computing system to: generate the output frame without resorting to pixel-wise blending. 22. The one or more NTCRM of claim 16 , wherein execution of the instructions is further operable to cause the computing system to: extract contextual information using a response of a convolutional layer of a multi-layer residual network. 23. The one or more NTCRM of claim 22 , wherein the multi-layer residual network is an 18 layer residual network (ResNet-18), the FSNN comprises an extended grid network (GridNet), and the GridNet comprises a grid of one or more rows and one or more columns. 24. The one
Activation functions · CPC title
Combinations of networks · CPC title
Learning methods · CPC title
Training; Learning · CPC title
using two or more images, e.g. averaging or subtraction · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.