Frame interpolation via adaptive convolution and adaptive separable convolution

US11468318B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11468318-B2
Application numberUS-201816495029-A
CountryUS
Kind codeB2
Filing dateMar 16, 2018
Priority dateMar 17, 2017
Publication dateOct 11, 2022
Grant dateOct 11, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and computer-readable media for context-aware synthesis for video frame interpolation are provided. A convolutional neural network (ConvNet) may, given two input video or image frames, interpolate a frame temporarily in the middle of the two input frames by combining motion estimation and pixel synthesis into a single step and formulating pixel interpolation as a local convolution over patches in the input images. The ConvNet may estimate a convolution kernel based on a first receptive field patch of a first input image frame and a second receptive field patch of a second input image frame. The ConvNet may then convolve the convolutional kernel over a first pixel patch of the first input image frame and a second pixel patch of the second input image frame to obtain color data of an output pixel of the interpolation frame. Other embodiments may be described and/or claimed.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer system comprising: processor circuitry communicatively coupled with memory circuitry, the memory circuitry to store program code of a convolutional neural network (ConvNet) and the processor circuitry is to operate the ConvNet to: obtain, as an input, a first image frame and a second image frame; estimate a pair of spatially-adaptive convolutional kernels to generate an individual output pixel based on a first receptive field patch of the first image frame and a second receptive field patch of the second image frame, wherein the estimation of the pair of spatially-adaptive convolutional kernels includes generation of a pair of kernel matrices, the pair of kernel matrices including a first kernel matrix for a first pixel patch of the first image frame and a second kernel matrix for a second pixel patch of the second image frame; convolve the pair of spatially-adaptive convolutional kernels over the first pixel patch of the first image frame and the second pixel patch of the second image frame to obtain a color of the individual output pixel; and generate and output an interpolation frame with the individual output pixel having the obtained color. 2. The computer system of claim 1 , wherein the processor circuitry is to operate the ConvNet to: produce the output pixel in the interpolation frame co-centered at same locations as the first receptive field patch in the first input image and the second receptive field patch in the second input image. 3. The computer system of claim 2 , wherein the first receptive field patch is centered around a pixel coordinate of the individual output pixel in the first image frame, and the second receptive field patch is centered around the pixel coordinate of the individual output pixel in the second image frame, and wherein the first pixel patch is centered within the first receptive field patch and the second pixel patch is centered within the second receptive field patch. 4. The computer system of claim 1 , wherein the ConvNet comprises: an input layer comprising raw pixel data of a plurality of input image frames, wherein the first image frame and the second image frame are among the plurality of input image frames; a plurality of convolutional layers comprising a corresponding one of a plurality of estimated kernels; a plurality of down-convolutional layers instead of one or more max-pooling layers, wherein individual down-convolutional layers of the plurality of down-convolutional layers are disposed between two convolutional layers of the plurality of convolutional layers; and an output layer comprising a feature map, wherein the feature map is a data structure that is representative of output pixels and corresponding obtained colors of the output pixels. 5. The computer system of claim 1 , wherein the ConvNet comprises: a contracting component comprising a first plurality of convolution layers and a plurality of pooling layers, wherein one or more convolution layers of the first plurality of convolution layers are grouped with a corresponding one of the plurality of pooling layers; an expanding component comprises a second plurality of convolution layers and a plurality of upsampling layers, wherein one or more convolution layers of the second plurality of convolution layers are grouped with a corresponding one of the plurality of upsampling layers; and a plurality of subnetworks, wherein each subnetwork of the plurality of subnetworks comprises a set of convolution layers and an upsampling layer. 6. The computer system of claim 5 , wherein the processor circuitry is to operate the ConvNet to: operate each subnetwork to estimate a corresponding one dimensional kernel for each pixel in the interpolation frame, wherein each of the corresponding one dimensional kernels is part of a pair of one dimensional kernels, and each pair of one dimensional kernels is used to compute a two dimensional kernel. 7. The computer system of claim 5 , wherein the processor circuitry is to operate the ConvNet to: operate the contracting component to extract features from the first and second image frames; and operate the expanding component to perform dense predictions on the extracted features. 8. The computer system of claim 5 , wherein the processor circuitry is to: operate each of the plurality of upsampling layers to perform a corresponding transposed convolution operation, a sub-pixel convolution operation, a nearest-neighbor operation, or a bilinear interpolation operation; and operate each of the plurality of pooling layers to perform a downsampling operation. 9. The computer system of claim 1 , wherein: each of the first kernel matrix and the second kernel matrix include a set of non-zero matrix values, locations of the non-zero matrix values indicate a motion, and the non-zero values are interpolation coefficients to combine pixel colors of the first and second pixel patches to generate the interpolation frame. 10. One or more non-transitory computer-readable media (NTCRM) including instructions of a convolutional neural network (ConvNet) wherein execution of the instructions by one or more processors is to cause a computer system to: obtain, as an input, a first image frame and a second image frame; estimate a spatially-adaptive convolutional kernel based on a first receptive field patch of the first image frame and a second receptive field patch of the second image frame, wherein, to estimate of the pair of spatially-adaptive convolutional kernels, execution of the instructions is to cause the computer system to generate a pair of kernel matrices, the pair of kernel matrices including a first kernel matrix for a first pixel patch of the first image frame and a second kernel matrix for a second pixel patch of the second image frame; convolve the pair of spatially-adaptive convolutional kernels over the first pixel patch of the first image frame and the second pixel patch of the second image frame to obtain a color of an output pixel for an interpolation frame; and generate and output the interpolation frame with the output pixel having the obtained color. 11. The one or more NTCRM of claim 10 , wherein execution of the instructions is to cause the computer system to: output of the output pixel in the interpolation frame co-centered at a same location as the first receptive field patch and the second receptive field patch in the first input image and the second input image, respectively. 12. The one or more NTCRM of claim 11 , wherein the first receptive field patch and the second receptive field patch are centered in the input image frame, and wherein the first pixel patch is centered within the first receptive field patch and the second pixel patch is centered within the second receptive field patch. 13. The one or more NTCRM of claim 10 , wherein the ConvNet comprises: an input layer comprising raw pixel data of a plurality of input image frames, wherein the first image frame and the second image frame are among the plurality of input image frames; a plurality of layers comprising a corresponding one of a plurality of convolutional layers, pooling layers, and/or Batch Normalization layers; a plurality of down-convolutional layers instead of one or more max-pooling layers, wherein the down-convolutional layers are disposed between some convolutional layers of the plurality of convolutional layers; and an output layer comprising a feature map comprising kernels that are used to produce the color of the output pixel. 14. The one or more NTCRM of claim 10 , wherein the ConvNet comprises: a contracting component comprising a first p

Assignees

Inventors

Classifications

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • involving temporal sub-sampling or interpolation, e.g. decimation or subsequent interpolation of pictures in a video sequence · CPC title

  • using neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11468318B2 cover?
Systems, methods, and computer-readable media for context-aware synthesis for video frame interpolation are provided. A convolutional neural network (ConvNet) may, given two input video or image frames, interpolate a frame temporarily in the middle of the two input frames by combining motion estimation and pixel synthesis into a single step and formulating pixel interpolation as a local convolu…
Who is the assignee on this patent?
Univ Portland State
What technology area does this patent fall under?
Primary CPC classification H04N7/0127. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Oct 11 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).