Methods and Apparatuses for Object Detection in a Scene Based on Lidar Data and Radar Data of the Scene
US-2020301013-A1 · Sep 24, 2020 · US
US11195038B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11195038-B2 |
| Application number | US-201916374138-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 3, 2019 |
| Priority date | Apr 23, 2018 |
| Publication date | Dec 7, 2021 |
| Grant date | Dec 7, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A device for extracting dynamic information comprises a convolutional neural network, wherein the device is configured to receive a sequence of data blocks acquired over time, each of said data blocks comprising a multi-dimensional representation of a scene. The convolutional neural network is configured to receive the sequence as input and to output dynamic information on the scene in response, wherein the convolutional neural network comprises a plurality of modules, and wherein each of said modules is configured to carry out a specific processing task for extracting the dynamic information.
Opening claim text (preview).
We claim: 1. A device for extracting dynamic information comprising: at least one processor configured to train a global convolutional neural network including multiple convolutional neural sub-networks, the processor further configured to execute the global convolutional neural network to: receive, as input, a sequence of data blocks acquired over time from at least one sensor that comprises a radar sensor, each of said data blocks comprising a multi-dimensional representation of a scene; and responsive to the input, output dynamic information on the scene, wherein the global convolutional neural network comprises a plurality of modules representative of the multiple neural sub-networks including at least a first module, a second module, and a third module, each of the plurality of modules being individually trained to carry out a specific processing task for extracting the dynamic information from the sequence of data blocks received as the input, wherein the first module is a data reduction module configured to extract, from a data block of the sequence, the sensor data of the scene being formed by a multi-dimensional grid of elements, each of the multi-dimensional grid of elements comprising one or more channels including at least one radar channel comprising motion data representing a motion of objects captured in the sensor data, wherein the second module is a classification module configured to extract, from the sensor data of the scene, first semantic segmentation data of the scene, the first semantic segmentation data comprising a classification of the sensor data for distinguishing between background and the objects captured in the sensor data, and wherein the third module is a temporal fusion module configured to extract, from the first semantic segmentation data extracted from the sensor data at a plurality of different time instances, second semantic segmentation data of the scene and the motion data of the scene as the dynamic information on the scene that is output in response to the input. 2. The device according to claim 1 , wherein: the first module is formed by a fully-connected layer neural network; the second module is formed by a U-net neural network; and the third module is formed by a recurrent neural network. 3. The device according to claim 1 , wherein the plurality of modules includes a fourth module configured to extract object data from the second semantic segmentation data and the motion data, wherein the object data represents a spatial occupancy of objects in the scene, wherein the object data additionally represents a velocity of objects in the scene. 4. The device according to claim 3 , wherein for a given object in the scene, the object data comprises a bounding box around the object, and wherein the object data additionally comprises the velocity of the object. 5. The device according to claim 3 , wherein the plurality of modules includes a fifth module configured to extract free-space data from the second semantic segmentation data and the motion data, wherein the free-space data represents a spatial occupancy of free space in the scene. 6. The device according to claim 5 , wherein the dynamic information comprises the object data, the free-space data or the motion data. 7. The device according to claim 5 , wherein the fifth module is formed by a fully convolutional network for semantic segmentation. 8. The device according to claim 3 , wherein the fourth module is formed by a region-proposal network. 9. The device according to claim 1 , wherein the third module includes at least one convolutional gated recurrent unit. 10. The device according to claim 1 , wherein the third module includes at least one convolutional Long Short-Term Memory neural network. 11. A method, comprising: training, by at least one processor of a system, a global convolutional neural network including a plurality of modules representative of multiple neural sub-networks including at least a first module trained as a data reduction module, a second module trained as a classification module, and a third module trained as a temporal fusion module, the training comprising individually training each of the plurality of modules to carry out a specific processing task for outputting dynamic information extracted from a sequence of data blocks received as an input each of the data blocks comprising a multi-dimensional representation of a scene; and executing, by the at least one processor of the system, the global convolution neural network by at least: receiving, as the input and over time from at least one sensor that comprises a radar sensor, the sequence of data blocks; and responsive to receiving the input, outputting dynamic information on the scene that is extracted from the input, the dynamic information being extracted by at least: extracting, by the first module, from a data block of the sequence, sensor data of the scene being formed by a multi-dimensional grid of elements, each of the multi-dimensional grid of elements comprising one or more channels including at least one radar channel comprising motion data representing a motion of objects captured in the sensor data; extracting, by the second module, from the sensor data of the scene, first semantic segmentation data of the scene, the first semantic segmentation data comprising a classification of the sensor data for distinguishing between background and the objects captured in the sensor data; and extracting, from the first semantic segmentation data extracted from the sensor data at a plurality of different time instances, second semantic segmentation data of the scene and the motion data of the scene as the dynamic information on the scene that is output in response to the input. 12. The method according to claim 11 , wherein: the first module is formed by a fully-connected layer neural network; the second module is formed by a U-net neural network; and the third module is formed by a recurrent neural network. 13. The method according to claim 11 , including extracting, with a fourth module, object data from the second semantic segmentation data and the motion data, wherein the object data represents a spatial occupancy of objects in the scene, and wherein the object data additionally represents a velocity of objects in the scene. 14. The method according to claim 13 , wherein for a given object in the scene, the object data comprises a bounding box around the object, and wherein the object data additionally comprises the velocity of the object. 15. The method according to claim 13 , including extracting, with a fifth module, free-space data from the second semantic segmentation data and the motion data, wherein the free-space data represents a spatial occupancy of free space in the scene. 16. The method according to claim 15 , wherein the dynamic information comprises the object data, the free-space data or the motion data. 17. A system comprising: at least one sensor including a radar sensor; and a device, the device comprising: at least one processor configured to train a global convolutional neural network including multiple convolutional neural sub-networks, the processor further configured to execute the global convolution neural network to: receive, as input, a sequence of data blocks acquired over time from at least one sensor that comprises a radar sensor, each of said data blocks comprising a multi-dimensional representation of a scene; and responsive to the input, output dynamic information on the scene; wherein the global convolutional neural network comprises a plu
Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads · CPC title
using neural networks · CPC title
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
Region-based segmentation · CPC title
using classification, e.g. of video objects · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.