System and method for trajectory prediction using a predicted endpoint conditioned network
US-2021295531-A1 · Sep 23, 2021 · US
US12565240B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12565240-B2 |
| Application number | US-202318309150-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 28, 2023 |
| Priority date | Oct 31, 2020 |
| Publication date | Mar 3, 2026 |
| Grant date | Mar 3, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure relates to methods and systems for spatiotemporal graph modelling of road users in observed frames of an environment in which an autonomous vehicle operates (i.e. a traffic scene), clustering of the road users into categories, and providing the spatiotemporal graph to a trained graphical convolutional neural network (GNN) to predict a future pedestrian action. The future pedestrian action can be: one of the pedestrian will cross a road and the pedestrian will not cross the road. The spatiotemporal graph includes a better understanding of the observed frames (i.e. traffic scene).
Opening claim text (preview).
The invention claimed is: 1 . A computer implemented method for predicting a pedestrian action, the method comprising: receiving a temporal sequence of observed frames, each observed frame including spatial information for a target pedestrian and a plurality of road users; for each observed frame in the sequence of observed frames: encoding, based at least on the spatial information included in the observed frame, a set of target pedestrian features for the target pedestrian and a respective set of road user features for each of the plurality of road users; generating, based at least on the spatial information included in the observed frame, a set of relative importance weights that includes, for each of the road users, a respective relative importance weight that indicates a relative importance of the road user to the target pedestrian, the respective relative importance weight for each road user being based both on a distance between the road user and the target pedestrian and a relative location importance of the road user to target pedestrian; clustering, based on the spatial information included in multiple observed frames in the sequence including the observed frame, groups of road users from the plurality of road users into respective clusters based on behavioral similarities, wherein each of the respective clusters identifies a group of similar behaved road users; predicting, based on the set of target pedestrian features encoded for each of a plurality of the observed frames, the respective sets of road user features encoded for each of the plurality of the observed frames, the set of relative importance weights generated for each of the plurality of the observed frames, and the respective clusters, a future action of the target pedestrian; and automatically controlling an action of an autonomous vehicle based on the predicted future action of the target pedestrian. 2 . The method of claim 1 , wherein the relative location importance for each road user is based on a direction of movement of the road user relative to the target pedestrian. 3 . The method of claim 2 , wherein the relative location importance for each road user is given a greater importance if the road user is moving towards the target pedestrian than if the road user is moving away from the target pedestrian. 4 . The method of claim 2 , wherein the relative location importance for each road user is further based on a travel distance of the road user along a road relative to a position of the target pedestrian. 5 . The method of claim 2 , wherein relative location importance for each road user is based on a distance of the road user from a reference line that extends from the position of the target pedestrian and is perpendicular to a roadway direction of travel. 6 . The method of claim 1 , wherein, for each road user, the distance between the road user and the target pedestrian is a Euclidian distance. 7 . The method of claim 1 , wherein for each observed frame in the sequence of observed frames: encoding the set of target pedestrian features for the target pedestrian and a respective set of road user features for each of the plurality of road users is based on the spatial information included in multiple observed frames in the sequence including the observed frame; and generating the set of relative importance weights for each road user is based on the spatial information included in multiple observed frames in the sequence including the observed frame. 8 . The method of claim 1 , wherein a respective spatial graph is generated for each of the observed frames, wherein for each observed frame: the respective spatial graph has a target pedestrian node representing the target pedestrian, and a plurality of road user nodes each representing a respective one of the plurality of road users, the respective spatial graph being defined by: (i) a feature matrix that includes the encoded target pedestrian features as features of the target pedestrian node, and includes the set of road user features encoded for the respective road users as features of the respective road user nodes; and (ii) an adjacency matrix that specifies: (a) respective weighted connecting edges between the target pedestrian node and each of the respective road user nodes corresponding to the set of relative importance weights generated for the observed frames; and (b) connecting edges between each of the road user nodes that are included in a respective cluster. 9 . The method of claim 8 , wherein predicting the future action of the target pedestrian is performed using a spatiotemporal convolutional graph neural network that receives the spatial graphs generated for the observed frames. 10 . The method of claim 1 , wherein the predicted pedestrian action is one of the pedestrian will cross in front of the autonomous vehicle or the pedestrian will not cross in front of the autonomous vehicle. 11 . The method of claim 1 , wherein for each observed frame in the sequence of observed frames: the set respective set of road user features encoded for each of the plurality of road users includes one or more of: a type of the road user; a location of the road user relative to the target pedestrian, a size of the road user, a velocity of the road user, and a direction of movement of the road user. 12 . A processing system comprising: one or more processor systems; one or more non-transitory memories storing instructions which when executed by the one or more processor systems cause the one or more processing systems to perform a method for predicting a pedestrian action comprising: receiving a temporal sequence of observed frames, each observed frame including spatial information for a target pedestrian and a plurality of road users; for each observed frame in the sequence of observed frames: encoding, based at least on the spatial information included in the observed frame, a set of target pedestrian features for the target pedestrian and a respective set of road user features for each of the plurality of road users; generating, based at least on the spatial information included in the observed frame, a set of relative importance weights that includes, for each of the road users, a respective relative importance weight that indicates a relative importance of the road user to the target pedestrian, the respective relative importance weight for each road user being based both on a distance between the road user and the target pedestrian and a relative location importance of the road user to target pedestrian; clustering, based on the spatial information included in multiple observed frames in the sequence including the observed frame, groups of road users from the plurality of road users into respective clusters based on behavioral similarities, wherein each of the respective clusters identifies a group of similar behaved road users; predicting, based on the set of target pedestrian features encoded for each of a plurality of the observed frames, the respective sets of road user features encoded for each of the plurality of the observed frames, the set of relative importance weights generated for each of the plurality of the observed frames, and the respective clusters, a future action of the target pedestrian; and automatically controlling an action of an autonomous vehicle based on the predicted future action of the target pedestrian. 13 . The system of claim 12 , wherein the relative location importance for each road user is based on a direction of movement of the road user relative to the target pedestrian, and the relative location importance for each road user is given a greater importanc
based on graphs, e.g. graph cuts or spectral clustering · CPC title
Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation · CPC title
Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads · CPC title
Relationship among other objects, e.g. converging dynamic objects · CPC title
Direction of movement, e.g. backwards · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.