Trajectory prediction on top-down scenes and associated model
US-2022092983-A1 · Mar 24, 2022 · US
US12154347B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12154347-B2 |
| Application number | US-202217691103-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 9, 2022 |
| Priority date | Mar 9, 2021 |
| Publication date | Nov 26, 2024 |
| Grant date | Nov 26, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting regions of an environment. One of the methods includes receiving a representation of a scene in an environment; processing the representation using a center prediction neural network to generate: (i) features of the scene in the environment, and (ii) a respective center score corresponding to each of a plurality of locations in the environment; selecting, based on the respective center scores, one or more of the plurality of locations; and for each selected location: processing an input comprising the features of the scene in the environment and data specifying the selected location using a geometry prediction neural network to generate a geometry prediction that represents a geometry of the region that is centered at the selected location.
Opening claim text (preview).
What is claimed is: 1. A method performed by one or more computers, the method comprising: receiving a representation of a scene in an environment; processing the representation using a center prediction neural network to generate: (i) features of the scene in the environment, and (ii) a respective center score corresponding to each of a plurality of locations in the environment, wherein each respective center score represents a predicted likelihood that a center of a region is located at the corresponding location in the environment; selecting, based on the respective center scores, one or more of the plurality of locations in the environment; and for each selected location: processing an input comprising the features of the scene in the environment and data specifying the selected location using a geometry prediction neural network to generate a geometry prediction that represents a geometry of the region that is centered at the selected location as a collection of one or more convexes by specifying, for each of the one or more convexes, a respective plurality of hyperplanes that define the convex. 2. The method of claim 1 , further comprising: for each selected location, generating a polygonal representation that represents the geometry of the region that is centered at the selected location from the respective plurality of hyperplanes for each of the one or more convexes. 3. The method of claim 1 , wherein the representation is a top-down representation of the scene in the environment. 4. The method of claim 3 , wherein the representation is generated from raw laser data collected by one or more laser sensors of a vehicle navigating through the environment. 5. The method of claim 3 , wherein each of the plurality of locations corresponds to a respective portion of the top-down representation. 6. The method of claim 5 , wherein each of the plurality of locations corresponds to a respective pixel in the top-down representation. 7. The method of claim 1 , wherein the center prediction neural network is configured to generate a respective pixel prediction score for each of a plurality of pixels in the representation that represents a likelihood that a region instance is depicted at the pixel. 8. The method of claim 7 , wherein the features of the scene comprise the respective per pixel prediction scores for the plurality of pixels. 9. The method of claim 1 , wherein the features of the scene comprise outputs of one or more hidden layers of the center prediction neural network. 10. The method of claim 1 , wherein the data specifying the selected location is a feature map that has a same spatial dimensionality as the features and that identifies the selected location. 11. The method of claim 1 , wherein the geometry prediction generated by the geometry prediction neural network includes, for each hyperplane of each convex, parameters of a signed distance function that measures a signed distance of any given point in the environment from the hyperplane. 12. The method of claim 11 , wherein the parameters of the signed distance function include a normal corresponding to the hyperplane. 13. The method of claim 11 , wherein the parameters of the signed distance function include an offset of the hyperplane from the origin. 14. The method of claim 1 , wherein the geometry prediction neural network comprises an encoder neural network configured to process the input to generate a set of hyperplane parameters and a decoder neural network configured to process the set of hyperplane parameters to generate the geometry prediction. 15. The method of claim 1 , wherein the center prediction neural network and the geometry prediction neural network have been trained jointly on a set of training data that includes a plurality of training representations and for each training representation a set of ground truth region geometries. 16. The method of claim 15 , wherein the center prediction neural network and the geometry prediction neural network have been trained jointly to minimize a loss function that includes a (i) a reconstruction loss that measures errors in geometry predictions relative to the ground truth region geometries and (ii) a center prediction loss that measures errors in center predictions generated by the center prediction neural network relative to region centers specified by the ground truth region geometries. 17. The method of claim 16 , wherein the center prediction neural network is configured to generate a respective pixel prediction score for each of a plurality of pixels in the representation that represents a likelihood that a region instance is depicted at the pixel, and wherein the loss function also includes (iii) a per pixel prediction loss that measures errors in the per pixel predictions relative to region locations specified by the ground truth region geometries. 18. The method of claim 17 , wherein the loss function also includes (iv) a localization loss. 19. The method of claim 16 , wherein during the joint training the geometry prediction neural network receives as input locations of region centers specified by the ground truth region geometries rather than locations selected based on center predictions generated by the center prediction neural network. 20. A system comprising: one or more computers; and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising: receiving a representation of a scene in an environment; processing the representation using a center prediction neural network to generate: (i) features of the scene in the environment, and (ii) a respective center score corresponding to each of a plurality of locations in the environment, wherein each respective center score represents a predicted likelihood that a center of a region is located at the corresponding location in the environment; selecting, based on the respective center scores, one or more of the plurality of locations in the environment; and for each selected location: processing an input comprising the features of the scene in the environment and data specifying the selected location using a geometry prediction neural network to generate a geometry prediction that represents a geometry of the region that is centered at the selected location as a collection of one or more convexes by specifying, for each of the one or more convexes, a respective plurality of hyperplanes that define the convex. 21. One or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: receiving a representation of a scene in an environment; processing the representation using a center prediction neural network to generate: (i) features of the scene in the environment, and (ii) a respective center score corresponding to each of a plurality of locations in the environment, wherein each respective center score represents a predicted likelihood that a center of a region is located at the corresponding location in the environment; selecting, based on the respective center scores, one or more of the plurality of locations in the environment; and for each selected location: processing an input comprising the features of the scene in the environment and data specifying the selected location using a geometry prediction neural network to generate a geometry prediction that represents a g
for mapping or imaging · CPC title
Range image; Depth image; 3D point clouds · CPC title
Vehicle exterior; Vicinity of vehicle · CPC title
Training; Learning · CPC title
Artificial neural networks [ANN] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.