Identifying complex events from hierarchical representation of data set features
US-2020394499-A1 · Dec 17, 2020 · US
US11657291B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11657291-B2 |
| Application number | US-202017063553-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 5, 2020 |
| Priority date | Oct 4, 2019 |
| Publication date | May 23, 2023 |
| Grant date | May 23, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a spatio-temporal embedding of a sequence of point clouds. One of the methods includes obtaining a temporal sequence comprising a respective point cloud input corresponding to each of a plurality of time points, each point cloud input comprising point cloud data generated from sensor data captured by one or more sensors of a vehicle at the respective time point; processing each point cloud input using a first neural network to generate a respective spatial embedding that characterizes the point cloud input; and processing the spatial embeddings of the point cloud inputs using a second neural network to generate a spatio-temporal embedding that characterizes the point cloud inputs in the temporal sequence.
Opening claim text (preview).
What is claimed is: 1. A method comprising: obtaining a temporal sequence comprising a respective point cloud input corresponding to each of a plurality of time points, each point cloud input comprising point cloud data generated from sensor data captured by one or more sensors of a vehicle at the respective time point; processing each point cloud input using a first neural network to generate a respective spatial embedding that characterizes the point cloud input, comprising, for each point cloud input: dividing the point cloud data into a plurality of voxels, generating a feature representation that includes features for each voxel, and processing the feature representation using the first neural network to generate the spatial embedding; processing the spatial embeddings of the point cloud inputs using a second neural network to generate a spatio-temporal embedding that characterizes the point cloud inputs in the temporal sequence; and processing the spatio-temporal embedding that characterizes the point cloud inputs in the temporal sequence using a task-specific neural network, wherein the task-specific neural network is configured to process the spatio-temporal embedding to generate a predicted output for a prediction task. 2. The method of claim 1 , further comprising: processing the spatio-temporal embedding using one or more additional task-specific neural networks, wherein each additional task-specific neural network is configured to generate a respective predicted output for a corresponding additional prediction task that is different from the prediction task. 3. The method of claim 1 , wherein the first neural network and the second neural network have been trained jointly on a first prediction task, and wherein the first prediction task is not the same as the prediction task. 4. The method of claim 1 , wherein generating a feature representation comprises: processing the point cloud data using one or more view neural networks, wherein a view neural network extracts features from the point cloud data with respect to a certain point of view; and combining the outputs of the one or more view neural networks to generate the feature representation. 5. The method of claim 4 , wherein the one or more view neural networks includes a birds-eye view neural network that extracts features with respect to a birds-eye view and a perspective view neural network that extracts features with respect to a perspective view. 6. The method of claim 4 , wherein processing the point cloud data using one or more view neural networks comprises processing each point in the point cloud data with a fully-connected layer that is shared by the one or more view neural networks to embed the points in a high-dimensional feature space. 7. The method of claim 4 , wherein combining the outputs of the one or more view neural networks comprises concatenating the outputs of the one or more view neural networks. 8. The method of claim 1 , wherein processing the spatial embeddings using the second neural network comprises processing the spatial embeddings with a one-dimensional convolutional neural network layer. 9. A system comprising one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: obtaining a temporal sequence comprising a respective point cloud input corresponding to each of a plurality of time points, each point cloud input comprising point cloud data generated from sensor data captured by one or more sensors of a vehicle at the respective time point; processing each point cloud input using a first neural network to generate a respective spatial embedding that characterizes the point cloud input, comprising, for each point cloud input: dividing the point cloud data into a plurality of voxels, generating a feature representation that includes features for each voxel, and processing the feature representation using the first neural network to generate the spatial embedding; processing the spatial embeddings of the point cloud inputs using a second neural network to generate a spatio-temporal embedding that characterizes the point cloud inputs in the temporal sequence; and processing the spatio-temporal embedding that characterizes the point cloud inputs using a task-specific neural network, wherein the task-specific neural network is configured to process the spatio-temporal embedding to generate a predicted output for a prediction task. 10. The system of claim 9 , wherein the operations further comprise: processing the spatio-temporal embedding using one or more additional task-specific neural networks, wherein each additional task-specific neural network is configured to generate a respective predicted output for a corresponding prediction task that is different from the prediction task. 11. The system of claim 9 , wherein the first neural network and the second neural network have been trained jointly on a first prediction task, and wherein the first prediction task is not one of the different prediction tasks corresponding to the plurality of task-specific neural networks. 12. The system of claim 9 , wherein generating feature representation comprises: processing the point cloud data using one or more view neural networks, wherein a view neural network extracts features from the point cloud data with respect to a certain point of view; and combining the outputs of the one or more view neural networks to generate the feature representation. 13. The system of claim 9 , wherein processing the spatial embeddings using the second neural network comprises processing the spatial embeddings with a one-dimensional convolutional neural network layer. 14. One or more non-transitory computer storage media encoded with computer program instructions that when executed by a plurality of computers cause the plurality of computers to perform operations comprising: obtaining a temporal sequence comprising a respective point cloud input corresponding to each of a plurality of time points, each point cloud input comprising point cloud data generated from sensor data captured by one or more sensors of a vehicle at the respective time point; processing each point cloud input using a first neural network to generate a respective spatial embedding that characterizes the point cloud input, comprising, for each point cloud input: dividing the point cloud data into a plurality of voxels, generating a feature representation that includes features for each voxel, and processing the feature representation using the first neural network to generate the spatial embedding; processing the spatial embeddings of the point cloud inputs using a second neural network to generate a spatio-temporal embedding that characterizes the point cloud inputs in the temporal sequence; and processing the spatio-temporal embedding that characterizes the point cloud inputs using a task-specific neural network, wherein the task-specific neural network is configured to process the spatio-temporal embedding to generate a predicted output for a prediction task. 15. The non-transitory computer storage media of claim 14 , wherein the operations further comprise: processing the spatio-temporal embedding using one or more additional task-specific neural networks, wherein each additional task-specific neural network is configured to generate a respective predicted output for a corresponding additional prediction task that is different from the prediction task. 16. The non-transitory computer
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Supervised learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads · CPC title
Backpropagation, e.g. using gradient descent · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.