Context-aware synthesis for video frame interpolation

US11475536B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11475536-B2
Application numberUS-201916971478-A
CountryUS
Kind codeB2
Filing dateFeb 22, 2019
Priority dateFeb 27, 2018
Publication dateOct 18, 2022
Grant dateOct 18, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and computer-readable media for context-aware synthesis for video frame interpolation are provided. Bidirectional flow may be used in combination with flexible frame synthesis neural network to handle occlusions and the like, and to accommodate inaccuracies in motion estimation. Contextual information may be used to enable frame synthesis neural network to perform informative interpolation. Optical flow may be used to provide initialization for interpolation. Other embodiments may be described and/or claimed.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer system comprising: processor circuitry coupled with memory circuitry, the memory circuitry is arranged to store program code of flow estimation logic, context extraction logic, warping logic, and frame synthesis neural network (FSNN) logic, and the processor circuitry is arranged to: operate the flow estimation logic to estimate a bidirectional optical flow between at least two input frames; operate the context extraction logic to extract context maps based on the estimated bidirectional optical flow; operate the warp logic to pre-warp the at least two input frames and corresponding context maps of the at least two input frames; operate the warp logic to feed the pre-warped frames and the corresponding context maps into an FSNN of the FSNN logic; and operate the FSNN logic to generate an output frame at a desired temporal position based on the pre-warped frames. 2. The computer system of claim 1 , wherein, to extract the context maps, the processor circuitry is arranged to operate the context extraction logic to extract per-pixel context information from the input frames as the context maps. 3. The computer system of claim 2 , wherein, to pre-warp the at least two input frames, the processor circuitry is arranged to operate the warping logic to use the bidirectional optical flow as a guide for the pre-warping of the input frames. 4. The computer system of claim 1 , wherein the processor circuitry is arranged to operate the flow estimation logic to generate an intermediate frame at a temporal position in between the at least two input frames. 5. The computer system of claim 1 , wherein the processor circuitry is arranged to operate the flow estimation logic to estimate the bidirectional optical flow using a Pyramidal processing, Warping, and Cost volume-Network (PWC-Net) mechanism. 6. The computer system of claim 1 , wherein the processor circuitry is arranged to operate the warping logic to use forward warping, wherein the estimated bidirectional optical flow is used to warp each of the at least two input frames to obtain corresponding pre-warped frames. 7. The computer system of claim 1 , wherein the processor circuitry is arranged to operate the FSNN logic to generate the output frame without performing pixel-wise blending. 8. The computer system of claim 1 , wherein the processor circuitry is to operate the context extraction logic to extract contextual information using a response of a convolutional layer of an 18 layer residual network (ResNet-18). 9. The computer system of claim 8 , wherein the FSNN comprises an extended grid network (GridNet), wherein the GridNet comprises a grid of one or more rows and one or more columns, wherein each row and each column comprise one or more Parametric Rectified Linear Units (PReLUs) and one or more convolution layers, wherein each convolution layer is disposed between the PReLUs. 10. The computer system of claim 1 , wherein the processor circuitry is arranged to operate the FSNN logic to measure a difference between the output frame and a ground truth frame during a training period, and wherein the ground truth frame comprises a center frame of a set of frames from among a plurality of frame sets of a training dataset. 11. A computer-implemented method comprising: estimating a bidirectional optical flow between at least two input frames; extracting context maps based on the estimated bidirectional optical flow, wherein the context maps comprise per-pixel context information from the at least two input frames; warping the at least two input frames and corresponding context maps of the at least two input frames, wherein the warping comprises using the bidirectional optical flow as a guide for the warping; feeding the warped frames and the corresponding context maps into a frame synthesis neural network (FSNN); and operating the FSNN to generate an output frame at a desired temporal position based on the warped frames. 12. The method of claim 11 , wherein the method comprises: estimating the bidirectional optical flow using a Pyramidal processing, Warping, and Cost volume-Network (PWC-Net) mechanism. 13. The method of claim 11 , wherein the method comprises: operating the FSNN to generate the output frame without performing pixel-wise blending. 14. The method of claim 11 , wherein the method comprises: generating an intermediate frame at the temporal position in between the at least two input frames. 15. The method of claim 11 , wherein the method comprises: operating the FSNN to measure a difference between the output frame and a ground truth frame during a training period, wherein the ground truth frame comprises a center frame of a set of frames from among a plurality of frame sets of a training dataset. 16. One or more non-transitory computer-readable media (NTCRM) comprising instructions, wherein execution of the instructions by one or more processors of a computing system is operable to cause the computing system to: estimate a bidirectional optical flow between at least two input frames; extract context maps based on the estimated bidirectional optical flow; warp the at least two input frames and corresponding context maps of the at least two input frames; feed the warped frames and the corresponding context maps into a frame synthesis neural network (FSNN); and operate the FSNN to generate an output frame at a desired temporal position based on the warped frames. 17. The one or more NTCRM of claim 16 , wherein execution of the instructions is further operable to cause the computing system to: extract per-pixel context information from the input frames as the context maps; and use the bidirectional optical flow as a guide to warp the input frames. 18. The one or more NTCRM of claim 17 , wherein execution of the instructions is further operable to cause the computing system to: generate an intermediate frame at a temporal position in between the at least two input frames. 19. The one or more NTCRM of claim 16 , wherein execution of the instructions is further operable to cause the computing system to: estimate the bidirectional optical flow using a Pyramidal processing, Warping, and Cost volume-Network (PWC-Net) mechanism. 20. The one or more NTCRM of claim 16 , wherein, to warp the at least two input frames, execution of the instructions is further operable to cause the computing system to: perform forward warping on the at least two input frames, wherein, to perform forward warping, execution of the instructions is further operable to cause the computing system to: use the estimated bidirectional optical flow to warp each of the at least two input frames to obtain corresponding warped frames. 21. The one or more NTCRM of claim 16 , wherein, to operate the FSNN, execution of the instructions is further operable to cause the computing system to: generate the output frame without resorting to pixel-wise blending. 22. The one or more NTCRM of claim 16 , wherein execution of the instructions is further operable to cause the computing system to: extract contextual information using a response of a convolutional layer of a multi-layer residual network. 23. The one or more NTCRM of claim 22 , wherein the multi-layer residual network is an 18 layer residual network (ResNet-18), the FSNN comprises an extended grid network (GridNet), and the GridNet comprises a grid of one or more rows and one or more columns. 24. The one

Assignees

Inventors

Classifications

  • Activation functions · CPC title

  • Combinations of networks · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • Training; Learning · CPC title

  • using two or more images, e.g. averaging or subtraction · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11475536B2 cover?
Systems, methods, and computer-readable media for context-aware synthesis for video frame interpolation are provided. Bidirectional flow may be used in combination with flexible frame synthesis neural network to handle occlusions and the like, and to accommodate inaccuracies in motion estimation. Contextual information may be used to enable frame synthesis neural network to perform informative …
Who is the assignee on this patent?
Univ Portland State
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 18 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).