Deep neural network architecture for image segmentation

US11600006B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11600006-B2
Application numberUS-201816171814-A
CountryUS
Kind codeB2
Filing dateOct 26, 2018
Priority dateOct 26, 2018
Publication dateMar 7, 2023
Grant dateMar 7, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus and method for encoding objects in a camera-captured image with a deep neural network pipeline including multiple convolutional neural networks or convolutional layers. After identifying at least a portion of the camera-capture image, a first convolutional layer is applied to the at least the portion of the camera-captured image and multiple subregion representations are pooled from the output of the first convolutional layer. One or more additional convolutions are performed. At least one deconvolution is performed and concatenated with the output of one or more convolutions. One or more final convolutions are performed. The at least the portion of the camera-captured image is classified as an object category in response to an output of the one or more final convolutions.

First claim

Opening claim text (preview).

We claim: 1. A method for encoding objects in a camera-captured image with a deep neural network pipeline, the method comprising: identifying at least a portion of the camera-captured image corresponding to a surrounding of a vehicle; applying a first convolutional neural network to the at least the portion of the camera-captured image at a first stage; pooling, at a second stage, a plurality of subregion representations from an output of the first convolutional neural network for the first stage; performing, at a third stage, at least one convolution of an output of the second stage; performing, at a fourth stage, a first deconvolution from the output of the first stage and a second deconvolution from the output of the second stage; performing, at the fourth stage, a third deconvolution from an output of the third stage; concatenating, at a fifth stage, the output of the fourth stage and the output of the third stage; applying a second convolutional neural network to the output of the fifth stage; and classifying the at least the portion of the camera-captured image as an object category from a predetermined list of road objects in response to an output of the second convolutional neural network. 2. The method of claim 1 , further comprising: concatenating an output of the second deconvolution with the output of the first stage to provide a concatenated input for the first deconvolution. 3. The method of claim 1 , further comprising: concatenating an output of the third deconvolution with the output of the second stage to provide a concatenated input for the second deconvolution. 4. The method of claim 1 , wherein pooling the plurality of subregion representations comprises: calculating a large image block at a first level of coarseness; and calculating a small image block at a second level of coarseness. 5. The method of claim 4 , wherein the plurality of subregion representations comprises a pyramid of blocks having varying objects or varying detail levels. 6. The method of claim 1 , further comprising: training the second convolutional neural network using the output of the fifth stage and a ground truth data set. 7. The method of claim 6 , wherein the ground truth data set includes a plurality of predetermined object categories. 8. The method of claim 1 , further comprising: sending the object category to a vehicle system. 9. The method of claim 8 , wherein the vehicle system provides navigation in response to the object category. 10. The method of claim 8 , wherein the vehicle system provides assisted or autonomous driving in response to the object category. 11. The method of claim 1 , further comprising: upsampling the output of the fifth stage to match a resolution of the camera-captured image. 12. The method of claim 1 , further comprising: inserting padding values in between at least row or at least one column in the output of the first stage or the output of the second stage comprises, wherein the padding values and the output of the first stage or the output of the second stage are applied to the first deconvolution or the second deconvolution. 13. The method of claim 12 , wherein the performing the at least one convolution of an output of the second stage further comprises: performing, at the third stage, a first third stage convolution including a set of weights; performing, at the third stage, a second third stage convolution from an output of the first third stage convolution and initialized using the set of weights from the first third stage convolution and defined before the second third stage convolution is performed. 14. The method of claim 1 , further comprising: expanding dimensions of a filter from the fifth stage; and performing a final stage convolution on the output of the fifth stage using the expanded dimensions of the filter from the fifth stage. 15. The method of claim 1 , wherein the deep neural network pipeline includes a plurality of paths including: a first prong from the first stage through the fourth stage for low level features and shallow layers; and a second prong from the first stage through the second stage, the third stage, and the fourth stage for upsampling. 16. The method of claim 15 , wherein the plurality of paths includes: a third prong from the first stage through the second stage, the third stage, and the fifth stage for pyramidal pooling. 17. A non-transitory computer readable medium including instructions that when executed by a process are configured to: identify at least a portion of an image collected at a vehicle; applying a first convolutional neural network to the at least the portion of the image at a first stage; pooling, at a second stage, a plurality of subregion representations from an output of the first convolutional neural network for the first stage; performing, at a third stage, at least one convolution of an output of the second stage; performing, at a fourth stage, a first deconvolution from the output of the first stage and a second deconvolution from the output of the second stage; performing, at the fourth stage, a third deconvolution from an output of the third stage; concatenating, at a fifth stage, the output of the fourth stage and the output of the third stage; applying a second convolutional neural network to the output of the fifth stage; and classifying the at least the portion of the image as a road object category in response to an output of the second convolutional neural network. 18. An apparatus for encoding objects in a camera-captured image with a deep neural network pipeline, the method comprising: at least one processor; and at least one memory including computer program code for one or more programs, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following: identifying at least a portion of the camera-captured image; applying a first convolutional neural network to the at least the portion of the camera-captured image at a first stage; pooling, at a second stage, a plurality of subregion representations from an output of the first convolutional neural network for the first stage; performing, at a third stage, at least one convolution of an output of the second stage; performing, at a fourth stage, a first deconvolution from the output of the first stage, a second deconvolution from the output of the second stage, and a third deconvolution from an output of the third stage; concatenating, at a fifth stage, the output of the fourth stage and the output of the third stage; applying a second convolutional neural network to the output of the fifth stage; and classifying the at least the portion of the camera-captured image as an object category in response to an output of the second convolutional neural network.

Assignees

Inventors

Classifications

  • using a video camera in combination with image processing means · CPC title

  • Transfer learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • G06T7/11Primary

    Region-based segmentation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11600006B2 cover?
An apparatus and method for encoding objects in a camera-captured image with a deep neural network pipeline including multiple convolutional neural networks or convolutional layers. After identifying at least a portion of the camera-capture image, a first convolutional layer is applied to the at least the portion of the camera-captured image and multiple subregion representations are pooled fro…
Who is the assignee on this patent?
Here Global Bv
What technology area does this patent fall under?
Primary CPC classification G06T7/11. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).