Device and a method for extracting dynamic information on a scene using a convolutional neural network

US11195038B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11195038-B2
Application numberUS-201916374138-A
CountryUS
Kind codeB2
Filing dateApr 3, 2019
Priority dateApr 23, 2018
Publication dateDec 7, 2021
Grant dateDec 7, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device for extracting dynamic information comprises a convolutional neural network, wherein the device is configured to receive a sequence of data blocks acquired over time, each of said data blocks comprising a multi-dimensional representation of a scene. The convolutional neural network is configured to receive the sequence as input and to output dynamic information on the scene in response, wherein the convolutional neural network comprises a plurality of modules, and wherein each of said modules is configured to carry out a specific processing task for extracting the dynamic information.

First claim

Opening claim text (preview).

We claim: 1. A device for extracting dynamic information comprising: at least one processor configured to train a global convolutional neural network including multiple convolutional neural sub-networks, the processor further configured to execute the global convolutional neural network to: receive, as input, a sequence of data blocks acquired over time from at least one sensor that comprises a radar sensor, each of said data blocks comprising a multi-dimensional representation of a scene; and responsive to the input, output dynamic information on the scene, wherein the global convolutional neural network comprises a plurality of modules representative of the multiple neural sub-networks including at least a first module, a second module, and a third module, each of the plurality of modules being individually trained to carry out a specific processing task for extracting the dynamic information from the sequence of data blocks received as the input, wherein the first module is a data reduction module configured to extract, from a data block of the sequence, the sensor data of the scene being formed by a multi-dimensional grid of elements, each of the multi-dimensional grid of elements comprising one or more channels including at least one radar channel comprising motion data representing a motion of objects captured in the sensor data, wherein the second module is a classification module configured to extract, from the sensor data of the scene, first semantic segmentation data of the scene, the first semantic segmentation data comprising a classification of the sensor data for distinguishing between background and the objects captured in the sensor data, and wherein the third module is a temporal fusion module configured to extract, from the first semantic segmentation data extracted from the sensor data at a plurality of different time instances, second semantic segmentation data of the scene and the motion data of the scene as the dynamic information on the scene that is output in response to the input. 2. The device according to claim 1 , wherein: the first module is formed by a fully-connected layer neural network; the second module is formed by a U-net neural network; and the third module is formed by a recurrent neural network. 3. The device according to claim 1 , wherein the plurality of modules includes a fourth module configured to extract object data from the second semantic segmentation data and the motion data, wherein the object data represents a spatial occupancy of objects in the scene, wherein the object data additionally represents a velocity of objects in the scene. 4. The device according to claim 3 , wherein for a given object in the scene, the object data comprises a bounding box around the object, and wherein the object data additionally comprises the velocity of the object. 5. The device according to claim 3 , wherein the plurality of modules includes a fifth module configured to extract free-space data from the second semantic segmentation data and the motion data, wherein the free-space data represents a spatial occupancy of free space in the scene. 6. The device according to claim 5 , wherein the dynamic information comprises the object data, the free-space data or the motion data. 7. The device according to claim 5 , wherein the fifth module is formed by a fully convolutional network for semantic segmentation. 8. The device according to claim 3 , wherein the fourth module is formed by a region-proposal network. 9. The device according to claim 1 , wherein the third module includes at least one convolutional gated recurrent unit. 10. The device according to claim 1 , wherein the third module includes at least one convolutional Long Short-Term Memory neural network. 11. A method, comprising: training, by at least one processor of a system, a global convolutional neural network including a plurality of modules representative of multiple neural sub-networks including at least a first module trained as a data reduction module, a second module trained as a classification module, and a third module trained as a temporal fusion module, the training comprising individually training each of the plurality of modules to carry out a specific processing task for outputting dynamic information extracted from a sequence of data blocks received as an input each of the data blocks comprising a multi-dimensional representation of a scene; and executing, by the at least one processor of the system, the global convolution neural network by at least: receiving, as the input and over time from at least one sensor that comprises a radar sensor, the sequence of data blocks; and responsive to receiving the input, outputting dynamic information on the scene that is extracted from the input, the dynamic information being extracted by at least: extracting, by the first module, from a data block of the sequence, sensor data of the scene being formed by a multi-dimensional grid of elements, each of the multi-dimensional grid of elements comprising one or more channels including at least one radar channel comprising motion data representing a motion of objects captured in the sensor data; extracting, by the second module, from the sensor data of the scene, first semantic segmentation data of the scene, the first semantic segmentation data comprising a classification of the sensor data for distinguishing between background and the objects captured in the sensor data; and extracting, from the first semantic segmentation data extracted from the sensor data at a plurality of different time instances, second semantic segmentation data of the scene and the motion data of the scene as the dynamic information on the scene that is output in response to the input. 12. The method according to claim 11 , wherein: the first module is formed by a fully-connected layer neural network; the second module is formed by a U-net neural network; and the third module is formed by a recurrent neural network. 13. The method according to claim 11 , including extracting, with a fourth module, object data from the second semantic segmentation data and the motion data, wherein the object data represents a spatial occupancy of objects in the scene, and wherein the object data additionally represents a velocity of objects in the scene. 14. The method according to claim 13 , wherein for a given object in the scene, the object data comprises a bounding box around the object, and wherein the object data additionally comprises the velocity of the object. 15. The method according to claim 13 , including extracting, with a fifth module, free-space data from the second semantic segmentation data and the motion data, wherein the free-space data represents a spatial occupancy of free space in the scene. 16. The method according to claim 15 , wherein the dynamic information comprises the object data, the free-space data or the motion data. 17. A system comprising: at least one sensor including a radar sensor; and a device, the device comprising: at least one processor configured to train a global convolutional neural network including multiple convolutional neural sub-networks, the processor further configured to execute the global convolution neural network to: receive, as input, a sequence of data blocks acquired over time from at least one sensor that comprises a radar sensor, each of said data blocks comprising a multi-dimensional representation of a scene; and responsive to the input, output dynamic information on the scene; wherein the global convolutional neural network comprises a plu

Assignees

Inventors

Classifications

  • Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads · CPC title

  • using neural networks · CPC title

  • Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title

  • G06T7/11Primary

    Region-based segmentation · CPC title

  • using classification, e.g. of video objects · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11195038B2 cover?
A device for extracting dynamic information comprises a convolutional neural network, wherein the device is configured to receive a sequence of data blocks acquired over time, each of said data blocks comprising a multi-dimensional representation of a scene. The convolutional neural network is configured to receive the sequence as input and to output dynamic information on the scene in response…
Who is the assignee on this patent?
Aptiv Tech Ltd
What technology area does this patent fall under?
Primary CPC classification G06T7/11. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 07 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).