Spatio-temporal embeddings

US11657291B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11657291-B2
Application numberUS-202017063553-A
CountryUS
Kind codeB2
Filing dateOct 5, 2020
Priority dateOct 4, 2019
Publication dateMay 23, 2023
Grant dateMay 23, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a spatio-temporal embedding of a sequence of point clouds. One of the methods includes obtaining a temporal sequence comprising a respective point cloud input corresponding to each of a plurality of time points, each point cloud input comprising point cloud data generated from sensor data captured by one or more sensors of a vehicle at the respective time point; processing each point cloud input using a first neural network to generate a respective spatial embedding that characterizes the point cloud input; and processing the spatial embeddings of the point cloud inputs using a second neural network to generate a spatio-temporal embedding that characterizes the point cloud inputs in the temporal sequence.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: obtaining a temporal sequence comprising a respective point cloud input corresponding to each of a plurality of time points, each point cloud input comprising point cloud data generated from sensor data captured by one or more sensors of a vehicle at the respective time point; processing each point cloud input using a first neural network to generate a respective spatial embedding that characterizes the point cloud input, comprising, for each point cloud input: dividing the point cloud data into a plurality of voxels, generating a feature representation that includes features for each voxel, and processing the feature representation using the first neural network to generate the spatial embedding; processing the spatial embeddings of the point cloud inputs using a second neural network to generate a spatio-temporal embedding that characterizes the point cloud inputs in the temporal sequence; and processing the spatio-temporal embedding that characterizes the point cloud inputs in the temporal sequence using a task-specific neural network, wherein the task-specific neural network is configured to process the spatio-temporal embedding to generate a predicted output for a prediction task. 2. The method of claim 1 , further comprising: processing the spatio-temporal embedding using one or more additional task-specific neural networks, wherein each additional task-specific neural network is configured to generate a respective predicted output for a corresponding additional prediction task that is different from the prediction task. 3. The method of claim 1 , wherein the first neural network and the second neural network have been trained jointly on a first prediction task, and wherein the first prediction task is not the same as the prediction task. 4. The method of claim 1 , wherein generating a feature representation comprises: processing the point cloud data using one or more view neural networks, wherein a view neural network extracts features from the point cloud data with respect to a certain point of view; and combining the outputs of the one or more view neural networks to generate the feature representation. 5. The method of claim 4 , wherein the one or more view neural networks includes a birds-eye view neural network that extracts features with respect to a birds-eye view and a perspective view neural network that extracts features with respect to a perspective view. 6. The method of claim 4 , wherein processing the point cloud data using one or more view neural networks comprises processing each point in the point cloud data with a fully-connected layer that is shared by the one or more view neural networks to embed the points in a high-dimensional feature space. 7. The method of claim 4 , wherein combining the outputs of the one or more view neural networks comprises concatenating the outputs of the one or more view neural networks. 8. The method of claim 1 , wherein processing the spatial embeddings using the second neural network comprises processing the spatial embeddings with a one-dimensional convolutional neural network layer. 9. A system comprising one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: obtaining a temporal sequence comprising a respective point cloud input corresponding to each of a plurality of time points, each point cloud input comprising point cloud data generated from sensor data captured by one or more sensors of a vehicle at the respective time point; processing each point cloud input using a first neural network to generate a respective spatial embedding that characterizes the point cloud input, comprising, for each point cloud input: dividing the point cloud data into a plurality of voxels, generating a feature representation that includes features for each voxel, and processing the feature representation using the first neural network to generate the spatial embedding; processing the spatial embeddings of the point cloud inputs using a second neural network to generate a spatio-temporal embedding that characterizes the point cloud inputs in the temporal sequence; and processing the spatio-temporal embedding that characterizes the point cloud inputs using a task-specific neural network, wherein the task-specific neural network is configured to process the spatio-temporal embedding to generate a predicted output for a prediction task. 10. The system of claim 9 , wherein the operations further comprise: processing the spatio-temporal embedding using one or more additional task-specific neural networks, wherein each additional task-specific neural network is configured to generate a respective predicted output for a corresponding prediction task that is different from the prediction task. 11. The system of claim 9 , wherein the first neural network and the second neural network have been trained jointly on a first prediction task, and wherein the first prediction task is not one of the different prediction tasks corresponding to the plurality of task-specific neural networks. 12. The system of claim 9 , wherein generating feature representation comprises: processing the point cloud data using one or more view neural networks, wherein a view neural network extracts features from the point cloud data with respect to a certain point of view; and combining the outputs of the one or more view neural networks to generate the feature representation. 13. The system of claim 9 , wherein processing the spatial embeddings using the second neural network comprises processing the spatial embeddings with a one-dimensional convolutional neural network layer. 14. One or more non-transitory computer storage media encoded with computer program instructions that when executed by a plurality of computers cause the plurality of computers to perform operations comprising: obtaining a temporal sequence comprising a respective point cloud input corresponding to each of a plurality of time points, each point cloud input comprising point cloud data generated from sensor data captured by one or more sensors of a vehicle at the respective time point; processing each point cloud input using a first neural network to generate a respective spatial embedding that characterizes the point cloud input, comprising, for each point cloud input: dividing the point cloud data into a plurality of voxels, generating a feature representation that includes features for each voxel, and processing the feature representation using the first neural network to generate the spatial embedding; processing the spatial embeddings of the point cloud inputs using a second neural network to generate a spatio-temporal embedding that characterizes the point cloud inputs in the temporal sequence; and processing the spatio-temporal embedding that characterizes the point cloud inputs using a task-specific neural network, wherein the task-specific neural network is configured to process the spatio-temporal embedding to generate a predicted output for a prediction task. 15. The non-transitory computer storage media of claim 14 , wherein the operations further comprise: processing the spatio-temporal embedding using one or more additional task-specific neural networks, wherein each additional task-specific neural network is configured to generate a respective predicted output for a corresponding additional prediction task that is different from the prediction task. 16. The non-transitory computer

Assignees

Inventors

Classifications

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • G06V20/58Primary

    Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11657291B2 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a spatio-temporal embedding of a sequence of point clouds. One of the methods includes obtaining a temporal sequence comprising a respective point cloud input corresponding to each of a plurality of time points, each point cloud input comprising point cloud data generated from sensor d…
Who is the assignee on this patent?
Waymo Llc
What technology area does this patent fall under?
Primary CPC classification G06V20/58. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 23 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).