Combining convolution and deconvolution for object detection
US-2019303715-A1 · Oct 3, 2019 · US
US2023245423A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2023245423-A1 |
| Application number | US-202118002690-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jun 18, 2021 |
| Priority date | Jul 2, 2020 |
| Publication date | Aug 3, 2023 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present technique relates to an information processing apparatus, an information processing method, and a program that enable recognition accuracy to be improved while suppressing an increase in load in object recognition using a CNN. An information processing apparatus: performs, a plurality of times, convolution of an image feature map representing a feature amount of an image of a first frame and generates a convolutional feature map of a plurality of layers; performs deconvolution of a feature map based on the convolutional feature map based on an image of a second frame preceding the first frame and generates a deconvolutional feature map; and performs object recognition based on the convolutional feature map based on an image of the first frame and on the deconvolutional feature map based on an image of the second frame. The present technique can be applied to, for example, a system which performs object recognition.
Opening claim text (preview).
1 . An information processing apparatus, comprising: a convoluting portion configured to perform, a plurality of times, convolution of an image feature map representing a feature amount of an image and to generate a convolutional feature map of a plurality of layers; a deconvoluting portion configured to perform deconvolution of a feature map based on the convolutional feature map and to generate a deconvolutional feature map; and a recognizing portion configured to perform object recognition based on the convolutional feature map and the deconvolutional feature map, wherein the convoluting portion is configured to perform, a plurality of times, convolution of the image feature map representing a feature amount of an image of a first frame and to generate the convolutional feature map of a plurality of layers; the deconvoluting portion is configured to perform deconvolution of a feature map based on the convolutional feature map based on an image of a second frame preceding the first frame and to generate the deconvolutional feature map, and the recognizing portion is configured to perform object recognition based on the convolutional feature map based on an image of the first frame and on the deconvolutional feature map based on an image of the second frame. 2 . The information processing apparatus according to claim 1 , wherein the recognizing portion is configured to perform object recognition by combining a first convolutional feature map based on an image of the first frame and a first deconvolutional feature map which is based on an image of the second frame and of which a layer is the same as the first convolutional feature map. 3 . The information processing apparatus according to claim 2 , wherein the deconvoluting portion is configured to generate, based on an image of the second frame, the first deconvolutional feature map by performing deconvolution of a feature map based on a second convolutional feature map which is deeper by n-number (n ≥ 1) of layers than the first convolutional feature map n-number of times. 4 . The information processing apparatus according to claim 3 , wherein the deconvoluting portion is configured to further generate, based on an image of the second frame, a second deconvolutional feature map by performing deconvolution of a feature map based on a third convolutional feature map which is deeper by m-number (m ≥ 1, m ≠ n) of layers than the first convolutional feature map m-number of times, and the recognizing portion is configured to perform object recognition by further combining the second deconvolutional feature map. 5 . The information processing apparatus according to claim 3 , wherein the second frame is a frame immediately preceding the first frame, n = 1 is satisfied, the deconvoluting portion is configured to further generate a third deconvolutional feature map by performing deconvolution, once, of a second deconvolutional feature map which is one layer deeper than the first convolutional feature map and which is used in object recognition of an image of the second frame, and the recognizing portion is configured to perform object recognition by further combining the third deconvolutional feature map. 6 . The information processing apparatus according to claim 2 , wherein the recognizing portion is configured to perform object recognition based on a synthesized feature map obtained by synthesizing the first convolutional feature map and the first deconvolutional feature map. 7 . The information processing apparatus according to claim 6 , wherein the deconvoluting portion is configured to generate the first deconvolutional feature map by performing deconvolution of the synthesized feature map which is used in object recognition of an image of the second frame and which is one layer deeper than the first deconvolutional feature map. 8 . The information processing apparatus according to claim 1 , wherein the convoluting portion and the deconvoluting portion are configured to perform processing in parallel. 9 . The information processing apparatus according to claim 1 , wherein the recognizing portion is configured to perform object recognition further based on the image feature map. 10 . The information processing apparatus according to claim 1 , further comprising a feature amount extracting portion configured to generate the image feature map. 11 . The information processing apparatus according to claim 1 , further comprising: a first feature amount extracting portion configured to extract a feature amount of a photographed image obtained by a camera and to generate a first image feature map; a second feature amount extracting portion configured to extract a feature amount of a sensor image representing a sensing result of a sensor of which a sensing range at least partially overlaps with a photographing range of the camera and to generate a second image feature map; and a synthesizing portion configured to generate a synthesized image feature map being the image feature map obtained by synthesizing the first image feature map and the second image feature map, wherein the convoluting portion is configured to perform convolution of the synthesized image feature map. 12 . The information processing apparatus according to claim 11 , further comprising: a geometric transformation portion configured to transform a first sensor image representing the sensing result according to a first coordinate system into a second sensor image representing the sensing result according to a second coordinate system, wherein the second feature amount extracting portion is configured to extract a feature amount of the second sensor image and to generate the second image feature map. 13 . The information processing apparatus according to claim 11 , wherein the sensor is a milliwave radar or LiDAR (Light Detection and Ranging). 14 . The information processing apparatus according to claim 1 , further comprising: a first feature amount extracting portion configured to extract a feature amount of a photographed image obtained by a camera and to generate a first image feature map; a second feature amount extracting portion configured to extract a feature amount of a sensor image representing a sensing result of a sensor of which a sensing range at least partially overlaps with a photographing range of the camera and to generate a second image feature map; a first recognizing portion which includes the convoluting portion, the deconvoluting portion, and the recognizing portion and which is configured to perform object recognition based on the first image feature map; a second recognizing portion which includes the convoluting portion, the deconvoluting portion, and the recognizing portion and which is configured to perform object recognition based on the second image feature map; and an integrating portion configured to integrate a recognition result of an object by the first recognizing portion and a recognition result of an object by the second recognizing portion. 15 . The information processing apparatus according to claim 14 , wherein the sensor is a milliwave radar or LiDAR (Light Detection and Ranging). 16 . The information processing apparatus according to claim 1 , wherein a feature map based on the convolutional feature map is the convolutional feature map itself. 17 . The information processing apparatus according to claim 1 , wherein the first frame and the second frame are adjacent frames. 18 . An information processing method, comprising the steps of: pe
using neural networks · CPC title
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
the classifiers operating on different input data, e.g. multi-modal recognition · CPC title
using classification, e.g. of video objects · CPC title
by mapping characteristic values of the pattern into a parameter space, e.g. Hough transformation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.