Method and apparatus for video super resolution using convolutional neural network with two-stage motion compensation
US-2019139205-A1 · May 9, 2019 · US
US10547823B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10547823-B2 |
| Application number | US-201816141426-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 25, 2018 |
| Priority date | Sep 25, 2018 |
| Publication date | Jan 28, 2020 |
| Grant date | Jan 28, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques related to interpolating an intermediate view image from multi-view images are discussed. Such techniques include downsampling first and second images that represent a view of a scene, generating a disparity map based on applying a first CNN to the downscaled first and second images, translating the downscaled first and second images using the disparity map, applying a second CNN to the translated downscaled first and second images and the disparity map to generate a downscaled intermediate image, and upscaling the downscaled intermediate image to an intermediate image at the resolution of the first and second images using an image super-resolution convolutional neural network.
Opening claim text (preview).
What is claimed is: 1. A system for implementing a convolutional neural network (CNN) comprising: a memory to store first and second images, wherein the first and second images comprise different views of a scene and are at a first resolution; and a processor coupled to the memory, the processor to: downscale the first and second images to provide first and second downscaled images; generate at least one disparity map based at least in part on applying a first convolutional neural network to a first input volume comprising the first and second downscaled images, wherein the disparity map comprises disparity values to translate the first and second downscaled images; determine first and second translated downscaled images based at least in part on the disparity map; apply a second convolutional neural network to a second input volume comprising the first and second translated downscaled images and the disparity map to generate a downscaled intermediate image comprising a view between the first and second translated downscaled images; generate the intermediate image at the first resolution based at least in part on applying an image super-resolution convolutional neural network to the downscaled intermediate image; and providing the intermediate image for presentment to a viewer. 2. The system of claim 1 , wherein the first convolutional neural network comprises a first encoder-decoder convolutional neural network, and wherein the processor to generate the at least one disparity map comprises the processor to: apply the first encoder-decoder convolutional neural network to the first input volume to generate first and second disparity maps; translate the first and second downscaled images using the first and second disparity maps to generate third and fourth translated downscaled images; and apply a second encoder-decoder convolutional neural network to a third input volume comprising the third and fourth translated downscaled images to generate the at least one disparity map. 3. The system of claim 2 , wherein the first and second encoder-decoder convolutional neural networks have the same architecture and implement the same neural network weights. 4. The system of claim 3 , wherein the first and second encoder-decoder convolutional neural networks each comprise an encoder portion having encoder layers to extract features from the first and third input volumes at differing resolutions and a decoder portion to combine the extracted features using skip connections corresponding to ones of the encoder layers to estimate optical flow. 5. The system of claim 1 , wherein the first convolutional neural network comprises an encoder-decoder convolutional neural network, the processor to generate the at least one disparity map comprises the processor to apply the encoder-decoder convolutional neural network to the first input volume to generate first and second disparity maps, and the first encoder-decoder convolutional neural network comprises an encoder portion having encoder layers to extract features from the first input volume at differing resolutions and a decoder portion to combine the extracted features using skip connections corresponding to ones of the encoder layers to estimate optical flow. 6. The system of claim 1 , wherein the second convolutional neural network comprises a volumetric convolutional neural network. 7. The system of claim 1 , wherein the processor to apply the image super-resolution convolutional neural network comprises the processor to: apply, to the downscaled intermediate image, a plurality of adjacent convolutional layers and a deconvolutional layer following the plurality of adjacent convolutional layers to generate a feature image at a second resolution greater than a third resolution of the downscaled intermediate image; upsample the downscaled intermediate image to generate a second intermediate image at the second resolution; and combine the feature image and the second intermediate image to generate an upsampled intermediate image. 8. The system of claim 7 , wherein the plurality of adjacent convolutional layers are separated into blocks, wherein each block comprises a predetermined number of convolutional layers and each block implements the same neural network weights, and wherein residual connections are provided between each block of convolutional layers, the residual connections to combine inputs and outputs of each block. 9. The system of claim 7 , wherein the processor to apply the image super-resolution convolutional neural network further comprises the processor to: apply, to the upsampled intermediate image, a plurality of second adjacent convolutional layers and a second deconvolutional layer following the plurality of second adjacent convolutional layers to generate a second feature image at the first resolution; upsample the upsampled intermediate image to generate a third intermediate image at the first resolution; and combine the second feature image and the third intermediate image to generate a final upsampled intermediate image. 10. The system of claim 1 , wherein the downscaled intermediate image is in a first color space, the processor further to: convert the downscaled intermediate image to a second color space comprising a luma channel and one or more second channels; separate the luma channel and the one or more second channels, wherein the image super-resolution convolutional neural network is applied to only the luma channel of the downscaled intermediate image; upscale the one or more second channels of the downscaled intermediate images; and concatenate an output image of the image super-resolution convolutional neural network having only a luma channel with the upscaled one or more second channels of the downscaled intermediate images to generate the intermediate image. 11. The system of claim 1 , the processor further to: separately train the view synthesis network and the image super-resolution convolutional neural network to determine view synthesis network parameters and image super-resolution convolutional neural network parameters, wherein the view synthesis network comprises the first convolutional neural network and the second convolutional neural network. 12. The system of claim 1 , wherein the first convolutional neural network comprises an encoder-decoder convolutional neural network, the encoder-decoder convolutional neural network comprises an encoder portion having encoder layers to extract features from the first input volume at differing resolutions and a decoder portion to combine the extracted features using skip connections corresponding to ones of the encoder layers to estimate optical flow, the second convolutional neural network comprises a volumetric convolutional neural network, and the image super-resolution convolutional neural network comprises a plurality of adjacent convolutional layers and a deconvolutional layer following the plurality of adjacent convolutional layers to generate a feature image at a second resolution greater than a third resolution of the downscaled intermediate image, an upsampler to upsample the downscaled intermediate image to generate a second intermediate image at the second resolution, and an adder to combine the feature image and the second intermediate image to generate an upsampled intermediate image. 13. A computer-implemented method for generating an intermediate image from multi-view images comprising: downscaling first and second images to provide first and second downscaled images, wherein the first and second images comprise different views of a scene and are at a first resolution; generating at least one disparity ma
Processor architectures; Processor configuration, e.g. pipelining · CPC title
using three or more two-dimensional [2D] image sensors · CPC title
Depth or disparity estimation from stereoscopic image signals · CPC title
Combinations of networks · CPC title
by using two or more images to influence resolution, frame rate or aspect ratio · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.