Method and apparatus for semantic segmentation and depth completion using a convolutional neural network

US11263756B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11263756-B2
Application numberUS-201916707404-A
CountryUS
Kind codeB2
Filing dateDec 9, 2019
Priority dateDec 9, 2019
Publication dateMar 1, 2022
Grant dateMar 1, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method for generating a semantically segmented image and a depth completion image using a convolutional neural network (CNN) from an input visible image and/or an input depth image. A central component of the CNN for semantic segmentation and depth completion is a common representation that allows both tasks to be performed when given any of these combinations of input images (i) both an input visible image and an input depth image, (ii) only an input visible image, or (iii) only an input depth image.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method for training a convolutional neural network for semantic segmentation and depth completion of images using a training set of image pairs, with each image pair in the training set having a visible image and a corresponding depth image, where at least one of the visible image and the corresponding depth image is semantically segmented, comprising: (a) training a first subnetwork of the neural network, the first subnetwork being a first encoder-decoder network with: (i) an encoder branch that generates an outputted first feature map from a visible image; and (ii) a decoder branch that generates a semantically segmented image from an inputted first feature map; wherein during training the encoder branch and the decoder branch of the first subnetwork are directly connected for setting the inputted first feature map as the outputted first feature map; (b) training a second subnetwork of the neural network, the second subnetwork being a second encoder-decoder network with: (i) an encoder branch of the second subnetwork generating an outputted second feature map from a depth image; and (ii) a decoder branch of the second subnetwork either generating a semantically segmented image or a depth image from an inputted second feature map; wherein during training the encoder branch and the decoder branch of the second subnetwork are directly connected for setting the inputted second feature map as the outputted second feature map; and (c) training a third subnetwork of the neural network arranged between the encoder branch and the decoder branch of the first subnetwork and the second subnetwork to reconstruct (i) the inputted first feature map from the outputted first feature map and (ii) the inputted second feature map from the outputted second feature map, wherein the third subnetwork is trained to generate a common representation of the outputted reconstructed feature maps for visible images and for depth images, allowing either the view from the first subnetwork or the view from the second subnetwork to be reconstructed therefrom. 2. The method of claim 1 , wherein the third subnetwork comprises a multi-view autoencoder network with at least one hidden layer. 3. The method of claim 2 , wherein the encoder branch of the first subnetwork and the second subnetwork comprise convolution layers of one of atrous and dilated. 4. The method of claim 3 , wherein the training set of image pairs comprises a set of RGB-D images. 5. The method of claim 1 , wherein said training (a) further comprises learning parameters of the first subnetwork. 6. The method of claim 5 , wherein said training (b) further comprises learning parameters of the second subnetwork. 7. The method of claim 6 , wherein said training (c) further comprises refining parameters of the first subnetwork and parameters of the second subnetwork. 8. The method of claim 1 , wherein a first loss function is associated with the first subnetwork, a second loss function is associated with the second subnetwork, and a third loss function is associated with the third subnetwork, the first subnetwork being trained at said training (a) by minimizing the first loss function, the second subnetwork being trained at said training (b) by minimizing the first loss function, and the first subnetwork, second subnetwork and third subnetwork being trained at said training (c) by minimizing a weighted sum of the first loss function, the second loss function and a third loss function. 9. A computer-implemented method for semantic segmentation and/or depth completion of an inputted visible image and/or an inputted depth image, comprising: accessing a convolutional neural network (CNN) for semantic segmentation and/or depth completion of images; and performing semantic segmentation and/or depth completion of the inputted visible image and/or the inputted depth image using the CNN, comprising inputting a first subnetwork of the CNN with the inputted visible image and/or inputting a second subnetwork of the CNN with the inputted depth image, wherein the CNN is trained using a training set of image pairs, with each image pair in the training set having a visible image and a corresponding depth image, where at least one of the visible image and the corresponding depth image is semantically segmented, by at least one of: (a) training the first subnetwork of the neural network, the first subnetwork being a first encoder-decoder network with: (i) an encoder branch that generates an outputted first feature map from a visible image; and (ii) a decoder branch that generates a semantically segmented image from an inputted first feature map; wherein during training the encoder branch and the decoder branch of the first subnetwork are directly connected for setting the inputted first feature map as the outputted first feature map; (b) training the second subnetwork of the neural network, the second subnetwork being a second encoder-decoder network with: (i) an encoder branch of the second subnetwork generating an outputted second feature map from a depth image; and (ii) a decoder branch of the second subnetwork either generating a semantically segmented image or a depth image from an inputted second feature map; wherein during training the encoder branch and the decoder branch of the second subnetwork are directly connected for setting the inputted second feature map as the outputted second feature map; and (c) training a third subnetwork of the neural network arranged between the encoder branch and the decoder branch of the first subnetwork and the second subnetwork to reconstruct (i) the inputted first feature map from the outputted first feature map and (ii) the inputted second feature map from the outputted second feature map; wherein third subnetwork is trained to generate a common representation of the outputted reconstructed feature maps for visible images and for depth images, allowing either the view from the first subnetwork or the view from the second subnetwork to be reconstructed therefrom. 10. The method of claim 9 , wherein the decoder branch of the second subnetwork generates a semantically segmented image from an inputted second feature map, and two semantically segmented images are produced at the respective outputs of the first subnetwork and the second subnetwork. 11. The method of claim 9 , wherein the decoder branch of the second subnetwork generates a depth completion image from an inputted second feature map, and a semantically segmented image and a depth completion image are produced at the respective outputs of the first subnetwork the second subnetwork. 12. The method of claim 9 , wherein the third subnetwork comprises a multi-view autoencoder network with at least one hidden layer. 13. The method of claim 12 , wherein the encoder branch of the first subnetwork the second subnetwork comprise convolution layers of one of atrous and dilated. 14. The method of claim 13 , wherein the training set of image pairs comprises a set of RGB-D images. 15. The method of claim 9 , wherein said training (a) further comprises learning parameters of the first subnetwork. 16. The method of claim 15 , wherein said training (b) further comprises learning parameters of the second subnetwork. 17. The method of claim 16 , wherein said training (c) further comprises refining parameters of the first subnetwork and parameters of the second subnetwork. 18. The method of claim 9 , wherein a first loss function is associated with the first subnetwork, a second loss function is associated with the second sub

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • G06T7/174Primary

    involving the use of two or more images · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11263756B2 cover?
A computer-implemented method for generating a semantically segmented image and a depth completion image using a convolutional neural network (CNN) from an input visible image and/or an input depth image. A central component of the CNN for semantic segmentation and depth completion is a common representation that allows both tasks to be performed when given any of these combinations of input im…
Who is the assignee on this patent?
Naver Corp
What technology area does this patent fall under?
Primary CPC classification G06T7/174. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 01 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).