Populating catalog data with item properties based on segmentation and classification models
US-2020034782-A1 · Jan 30, 2020 · US
US11600006B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11600006-B2 |
| Application number | US-201816171814-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 26, 2018 |
| Priority date | Oct 26, 2018 |
| Publication date | Mar 7, 2023 |
| Grant date | Mar 7, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus and method for encoding objects in a camera-captured image with a deep neural network pipeline including multiple convolutional neural networks or convolutional layers. After identifying at least a portion of the camera-capture image, a first convolutional layer is applied to the at least the portion of the camera-captured image and multiple subregion representations are pooled from the output of the first convolutional layer. One or more additional convolutions are performed. At least one deconvolution is performed and concatenated with the output of one or more convolutions. One or more final convolutions are performed. The at least the portion of the camera-captured image is classified as an object category in response to an output of the one or more final convolutions.
Opening claim text (preview).
We claim: 1. A method for encoding objects in a camera-captured image with a deep neural network pipeline, the method comprising: identifying at least a portion of the camera-captured image corresponding to a surrounding of a vehicle; applying a first convolutional neural network to the at least the portion of the camera-captured image at a first stage; pooling, at a second stage, a plurality of subregion representations from an output of the first convolutional neural network for the first stage; performing, at a third stage, at least one convolution of an output of the second stage; performing, at a fourth stage, a first deconvolution from the output of the first stage and a second deconvolution from the output of the second stage; performing, at the fourth stage, a third deconvolution from an output of the third stage; concatenating, at a fifth stage, the output of the fourth stage and the output of the third stage; applying a second convolutional neural network to the output of the fifth stage; and classifying the at least the portion of the camera-captured image as an object category from a predetermined list of road objects in response to an output of the second convolutional neural network. 2. The method of claim 1 , further comprising: concatenating an output of the second deconvolution with the output of the first stage to provide a concatenated input for the first deconvolution. 3. The method of claim 1 , further comprising: concatenating an output of the third deconvolution with the output of the second stage to provide a concatenated input for the second deconvolution. 4. The method of claim 1 , wherein pooling the plurality of subregion representations comprises: calculating a large image block at a first level of coarseness; and calculating a small image block at a second level of coarseness. 5. The method of claim 4 , wherein the plurality of subregion representations comprises a pyramid of blocks having varying objects or varying detail levels. 6. The method of claim 1 , further comprising: training the second convolutional neural network using the output of the fifth stage and a ground truth data set. 7. The method of claim 6 , wherein the ground truth data set includes a plurality of predetermined object categories. 8. The method of claim 1 , further comprising: sending the object category to a vehicle system. 9. The method of claim 8 , wherein the vehicle system provides navigation in response to the object category. 10. The method of claim 8 , wherein the vehicle system provides assisted or autonomous driving in response to the object category. 11. The method of claim 1 , further comprising: upsampling the output of the fifth stage to match a resolution of the camera-captured image. 12. The method of claim 1 , further comprising: inserting padding values in between at least row or at least one column in the output of the first stage or the output of the second stage comprises, wherein the padding values and the output of the first stage or the output of the second stage are applied to the first deconvolution or the second deconvolution. 13. The method of claim 12 , wherein the performing the at least one convolution of an output of the second stage further comprises: performing, at the third stage, a first third stage convolution including a set of weights; performing, at the third stage, a second third stage convolution from an output of the first third stage convolution and initialized using the set of weights from the first third stage convolution and defined before the second third stage convolution is performed. 14. The method of claim 1 , further comprising: expanding dimensions of a filter from the fifth stage; and performing a final stage convolution on the output of the fifth stage using the expanded dimensions of the filter from the fifth stage. 15. The method of claim 1 , wherein the deep neural network pipeline includes a plurality of paths including: a first prong from the first stage through the fourth stage for low level features and shallow layers; and a second prong from the first stage through the second stage, the third stage, and the fourth stage for upsampling. 16. The method of claim 15 , wherein the plurality of paths includes: a third prong from the first stage through the second stage, the third stage, and the fifth stage for pyramidal pooling. 17. A non-transitory computer readable medium including instructions that when executed by a process are configured to: identify at least a portion of an image collected at a vehicle; applying a first convolutional neural network to the at least the portion of the image at a first stage; pooling, at a second stage, a plurality of subregion representations from an output of the first convolutional neural network for the first stage; performing, at a third stage, at least one convolution of an output of the second stage; performing, at a fourth stage, a first deconvolution from the output of the first stage and a second deconvolution from the output of the second stage; performing, at the fourth stage, a third deconvolution from an output of the third stage; concatenating, at a fifth stage, the output of the fourth stage and the output of the third stage; applying a second convolutional neural network to the output of the fifth stage; and classifying the at least the portion of the image as a road object category in response to an output of the second convolutional neural network. 18. An apparatus for encoding objects in a camera-captured image with a deep neural network pipeline, the method comprising: at least one processor; and at least one memory including computer program code for one or more programs, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following: identifying at least a portion of the camera-captured image; applying a first convolutional neural network to the at least the portion of the camera-captured image at a first stage; pooling, at a second stage, a plurality of subregion representations from an output of the first convolutional neural network for the first stage; performing, at a third stage, at least one convolution of an output of the second stage; performing, at a fourth stage, a first deconvolution from the output of the first stage, a second deconvolution from the output of the second stage, and a third deconvolution from an output of the third stage; concatenating, at a fifth stage, the output of the fourth stage and the output of the third stage; applying a second convolutional neural network to the output of the fifth stage; and classifying the at least the portion of the camera-captured image as an object category in response to an output of the second convolutional neural network.
using a video camera in combination with image processing means · CPC title
Transfer learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Region-based segmentation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.