What technology area does this patent fall under?

Primary CPC classification G06T7/269. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 29 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Feature pyramid warping for video frame interpolation

US12288346B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12288346-B2
Application number	US-202017422464-A
Country	US
Kind code	B2
Filing date	Jan 14, 2020
Priority date	Jan 15, 2019
Publication date	Apr 29, 2025
Grant date	Apr 29, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and storage media are described for motion estimation in video frame interpolation. Disclosed embodiments use feature pyramids as image representations for motion estimation and seamlessly integrates them into a deep neural network for frame interpolation. A feature pyramid is extracted for each of two input frames. These feature pyramids are wrapped together with the input frames to the target temporal position according to the inter-frame motion estimated via optical flow. A frame synthesis network is used to predict interpolation results from the pre-warped feature pyramids and input frames. The feature pyramid extractor and the frame synthesis network are jointly trained for the task of frame interpolation. An extensive quantitative and qualitative evaluation demonstrates that the described embodiments utilizing feature pyramids enables robust, high-quality video frame interpolation. Other embodiments may be described and/or claimed.

First claim

Opening claim text (preview).

The invention claimed is: 1. An apparatus configured to operate a frame interpolation neural network (FINN), the apparatus comprising: optical flow estimation (OFE) circuitry configured to estimate a forward optical flow and a backward optical flow from a first input frame and a second input frame of a video; feature pyramid extraction (FPE) circuitry configured to extract a first feature pyramid from the first input frame and a second feature pyramid from the second input frame; warping circuitry configured to warp the first feature pyramid and the first input frame to a target temporal position between the first and second input frames using the forward optical flow, and warp the second feature pyramid and the second input frame to the target temporal position using the backward optical flow; and frame synthesis neural network (FSN) circuitry configured to generate an interpolated output frame at the target temporal position guided by the warped first and second feature pyramids and the warped first and second input frames. 2. The apparatus of claim 1 , wherein the FPE circuitry is further configured to apply a same configuration to the first and second input frames to extract the first and second feature pyramids, respectively. 3. The apparatus of claim 1 , wherein: the first feature pyramid includes a first set of features extracted from the first input frame at each resolution of a plurality of resolutions; the second feature pyramid includes a second set of features extracted from the second input frame at each resolution of the plurality of resolutions; and at least some features in the first set of features and at least some features in the second set of features are based on a color space of the first and second input frames. 4. The apparatus of claim 1 , wherein the interpolated output frame includes pixels of the first and the second input frames shifted from the first and second input frames, respectively, to replicate motion to take place from the first input frame to the target temporal position and from the target temporal position to the second input frame. 5. The apparatus of claim 1 , wherein the FPE circuitry is further configured to: generate the first and second input frames at each of a plurality of resolutions based on features extracted from the first and second input frames. 6. The apparatus of claim 1 , wherein, to extract the first and second feature pyramids, the FPE circuitry is further configured to: read a number of input features from the first and second input frames at each resolution of a plurality of resolutions; and produce a number of output features from the number of input features for each of the first and second input frames. 7. The apparatus of claim 6 , wherein the FPE circuitry comprises: convolutional circuitry interleaved with activation function circuitry and configured to convolve one or both of the first and second input frames at each resolution of a plurality of resolutions to extract a set of features from the first and second input frames at each resolution of the plurality of resolutions. 8. The apparatus of claim 3 , wherein the FPE circuitry is further configured to: use the interpolated output frame to extract new feature pyramids from respective input frames, the new feature pyramids including a set of features different than the features of the first and second feature pyramids. 9. The apparatus of claim 1 , wherein the FSN circuitry comprises a grid of processing blocks, wherein each row in the grid of processing blocks corresponds to a resolution of a set of resolutions of the first and second feature pyramids. 10. The apparatus of claim 9 , wherein a first processing block in each row is configured to receive a warped set of features at the corresponding resolution in the first and second feature pyramids. 11. The apparatus of claim 1 , wherein the OFE circuitry, the FPE circuitry, the FSN circuitry, and the warping circuitry FW circuitry are coupled to one another via an interconnect technology, and implemented as: respective dies of a System-in-Package (SiP) or Multi-Chip Package (MCP); respective execution units or processor cores of a general purpose processor; or respective digital signal processors (DSPs), field-programmable gate arrays (FPGAs), Application Specific Integrated Circuits (ASICs), programmable logic devices (PLDs), System-on-Chips (SoCs), Graphics Processing Units (GPUs), SiPs, MCPs, or any combination of DSPs, FPGAs, ASICs, PLDs, SoCs, GPUs, SiPs, and MCPs. 12. One or more non-transitory computer-readable media (NTCRM) comprising instructions of a frame interpolation neural network (FINN), wherein execution of the instructions by one or more processors is to cause the one or more processors to: obtain a first input frame and a second input frame of a video; estimate a forward optical flow and a backward optical flow from the first and second input frames, the forward optical flow indicating how pixels in the first input frame are to be changed to produce the second input frame during a time period starting from the first input frame and ending at the second input frame, and the backward optical flow indicating how pixels in the second input frame are to be changed to produce the first input frame during a time period starting from the first input frame and ending at the second input frame; extract a first feature pyramid from the first input frame and a second feature pyramid from the second input frame, the first feature pyramid including a first set of features extracted from the first input frame at each resolution of a plurality of resolutions, and the second feature pyramid including a second set of features extracted from the second input frame at each resolution of the plurality of resolutions; warp the first feature pyramid and the first input frame toward a target temporal position between the first and second input frames using the forward optical flow; warp the second feature pyramid and the second input frame toward the target temporal position using the backward optical flow; and generate an output frame at the target temporal position based on the warped first and second feature pyramids and the warped first and second input frames. 13. The one or more NTCRM of claim 12 , wherein the first and second sets of features are based on a color space of the first and second input frames, respectively. 14. The one or more NTCRM of claim 12 , wherein execution of the instructions is to further cause the one or more processors to: read a number of input features from the first and second input frames at each resolution; and generate a number of output features from the number of input features at each resolution, wherein the output features at each resolution represent different octaves of the input features and vary in number. 15. The one or more NTCRM of claim 14 , wherein the FINN comprises a plurality of convolutional functions interleaved with a plurality of activation functions, and execution of the instructions is to cause the one or more processors to: operate the convolutional functions to convolve the first and second input frames at each resolution; and operate the activation functions to extract individual features from the convolved first and second input frames. 16. The one or more NTCRM of claim 12 , wherein the FINN includes a frame synthesis neural network comprising a grid of processing blocks, wherein each row in the grid of processing blocks corresponds to a resolution of the plurality of resolutions, and execution of the instructions is to cause the one or more

Assignees

Univ Portland State

Inventors

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/09
Supervised learning · CPC title
G06T2207/20084
Artificial neural networks [ANN] · CPC title
G06T2207/20081
Training; Learning · CPC title
G06T1/20
Processor architectures; Processor configuration, e.g. pipelining · CPC title

Patent family

Related publications grouped by family.

View patent family 71613575

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12288346B2 cover?: Methods, systems, and storage media are described for motion estimation in video frame interpolation. Disclosed embodiments use feature pyramids as image representations for motion estimation and seamlessly integrates them into a deep neural network for frame interpolation. A feature pyramid is extracted for each of two input frames. These feature pyramids are wrapped together with the input fr…
Who is the assignee on this patent?: Univ Portland State
What technology area does this patent fall under?: Primary CPC classification G06T7/269. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 29 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).