Method and apparatus for convolutional neural network-based video denoising

US11900566B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11900566-B1
Application numberUS-202016912395-A
CountryUS
Kind codeB1
Filing dateJun 25, 2020
Priority dateJun 26, 2019
Publication dateFeb 13, 2024
Grant dateFeb 13, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An image capture device includes an image sensor and a processor. The image sensor is configured to capture a first plurality of frames, a second plurality of frames, and a third plurality of frames. The processor includes a first denoising layer and a second denoising layer. The first denoising layer includes a first denoiser, a second denoiser, and a third denoiser. The first denoiser is configured to denoise the first plurality of frames and output a first denoised frame. The second denoiser is configured to denoise the second plurality of frames and output a second denoised frame. The third denoiser is configured to denoise the third plurality of frames and output a third denoised frame. The second denoising layer includes a fourth denoiser. The fourth denoiser is configured to output a denoised frame based on the first denoised frame, the second denoised frame, and the third denoised frame.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for denoising a video comprising multiple frames, the method comprising: obtaining, at a first denoiser, a central frame, an input frame that is temporally precedent to the central frame, and a second input frame that is temporally precedent to the input frame; obtaining, at a second denoiser, the central frame, the input frame, and a third input frame that is temporally subsequent to the central frame; obtaining, at a third denoiser, the central frame, the third input frame, and a fourth input frame that is temporally subsequent to the third input frame; obtaining, at a fourth denoiser, a first denoised frame from the first denoiser, a second denoised frame from the second denoiser, and a third denoised frame from the third denoiser; denoising the first denoised frame, the second denoised frame, and the third denoised frame, wherein the denoising is based on a multi-scale encoder-decoder architecture, and wherein the multi-scale architecture includes a plurality of skip-connections that forward an output of an encoder layer directly to an input of a corresponding decoder layer; and outputting a fourth denoised frame based on the first denoised frame, the second denoised frame, and the third denoised frame. 2. The method of claim 1 , wherein the input frame and the third input frame are adjacent to the central frame. 3. The method of claim 1 , wherein the second input frame is adjacent to the input frame and the fourth input frame is adjacent to the third input frame. 4. The method of claim 1 , wherein each respective frame is denoised based on a noise map. 5. The method of claim 1 , wherein the denoising includes applying a convolutional layer, an activation layer, and a normalization layer. 6. The method of claim 5 , wherein the activation layer is a pointwise rectified linear unit (ReLU) activation layer. 7. The method of claim 6 , wherein the normalization layer is a batch normalization layer that is placed between the convolutional layer and the pointwise ReLU activation layer during training. 8. The method of claim 6 , wherein the normalization layer is an affine layer that applies a learned normalization. 9. An image capture device comprising: an image sensor configured to capture a central frame, an input frame that is temporally precedent to the central frame, a second input frame that is temporally precedent to the input frame, a third input frame that is temporally subsequent to the central frame, and a fourth input frame that is temporally subsequent to the third input frame; a first denoiser configured to denoise the central frame, the input frame, and the second input frame, and output a first denoised frame; a second denoiser configured to denoise the central frame, the input frame, and the third input frame, and output a second denoised frame; a third denoiser configured to denoise the central frame, the third input frame, and the fourth input frame, and output a third denoised frame; and a fourth denoiser configured to denoise the first denoised frame, the second denoised frame, and the third denoised frame, and output a fourth denoised frame, wherein the first denoiser, the second denoiser, the third denoiser, and the fourth denoiser are each further configured to receive a noise map and denoise respective frames based on the noise map. 10. The image capture device of claim 9 , wherein the first denoiser, the second denoiser, the third denoiser, and the fourth denoiser are based on a multi-scale encoder-decoder architecture. 11. The image capture device of claim 10 , wherein the multi-scale architecture includes a plurality of skip-connections that forward an output of an encoder layer directly to an input of a corresponding decoder layer. 12. The image capture device of claim 9 , wherein the first denoiser, the second denoiser, the third denoiser, and the fourth denoiser each comprise a convolutional layer, an activation layer, and a normalization layer. 13. The image capture device of claim 12 , wherein the activation layer is a pointwise rectified linear unit (ReLU) activation layer. 14. The image capture device of claim 13 , wherein the normalization layer is a batch normalization layer that is placed between the convolutional layer and the pointwise ReLU activation layer during training. 15. The image capture device of claim 13 , wherein the normalization layer is an affine layer. 16. An image capture device comprising: an image sensor configured to: capture a first plurality of frames comprising a central frame, an input frame that is temporally precedent to the central frame, and a second input frame that is temporally precedent to the input frame, capture a second plurality of frames comprising the central frame, the input frame, and a third input frame that is temporally subsequent to the central frame, and capture a third plurality of frames comprising the central frame, the third input frame, and a fourth input frame that is temporally subsequent to the third input frame; and a processor comprising: a first denoising layer, the first denoising layer comprising: a first denoiser configured to denoise the first plurality of frames and output a first denoised frame, a second denoiser configured to denoise the second plurality of frames and output a second denoised frame, and a third denoiser configured to denoise the third plurality of frames and output a third denoised frame; and a second denoising layer comprising a fourth denoiser configured to output a denoised frame based on the first denoised frame, the second denoised frame, and the third denoised frame, wherein the first denoising layer and the second denoising layer comprise a multi-scale architecture, and wherein the multi-scale architecture includes a plurality of skip-connections that forward an output of an encoder layer directly to an input of a corresponding decoder layer. 17. The image capture device of claim 16 , wherein the input frame and the third input frame are adjacent to the central frame. 18. The image capture device of claim 16 , wherein the second input frame is adjacent to the input frame and the fourth input frame is adjacent to the third input frame. 19. The image capture device of claim 16 , wherein the first denoiser, the second denoiser, the third denoiser, and the fourth denoiser are each further configured to receive a noise map and output each respective frame based on the noise map. 20. The image capture device of claim 15 , wherein the affine layer is configured to apply a learned normalization.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11900566B1 cover?
An image capture device includes an image sensor and a processor. The image sensor is configured to capture a first plurality of frames, a second plurality of frames, and a third plurality of frames. The processor includes a first denoising layer and a second denoising layer. The first denoising layer includes a first denoiser, a second denoiser, and a third denoiser. The first denoiser is conf…
Who is the assignee on this patent?
Gopro Inc
What technology area does this patent fall under?
Primary CPC classification G06T5/002. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 13 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).