Method and system for graph neural network based pedestrian action prediction in autonomous driving systems

US12565240B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12565240-B2
Application numberUS-202318309150-A
CountryUS
Kind codeB2
Filing dateApr 28, 2023
Priority dateOct 31, 2020
Publication dateMar 3, 2026
Grant dateMar 3, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to methods and systems for spatiotemporal graph modelling of road users in observed frames of an environment in which an autonomous vehicle operates (i.e. a traffic scene), clustering of the road users into categories, and providing the spatiotemporal graph to a trained graphical convolutional neural network (GNN) to predict a future pedestrian action. The future pedestrian action can be: one of the pedestrian will cross a road and the pedestrian will not cross the road. The spatiotemporal graph includes a better understanding of the observed frames (i.e. traffic scene).

First claim

Opening claim text (preview).

The invention claimed is: 1 . A computer implemented method for predicting a pedestrian action, the method comprising: receiving a temporal sequence of observed frames, each observed frame including spatial information for a target pedestrian and a plurality of road users; for each observed frame in the sequence of observed frames: encoding, based at least on the spatial information included in the observed frame, a set of target pedestrian features for the target pedestrian and a respective set of road user features for each of the plurality of road users; generating, based at least on the spatial information included in the observed frame, a set of relative importance weights that includes, for each of the road users, a respective relative importance weight that indicates a relative importance of the road user to the target pedestrian, the respective relative importance weight for each road user being based both on a distance between the road user and the target pedestrian and a relative location importance of the road user to target pedestrian; clustering, based on the spatial information included in multiple observed frames in the sequence including the observed frame, groups of road users from the plurality of road users into respective clusters based on behavioral similarities, wherein each of the respective clusters identifies a group of similar behaved road users; predicting, based on the set of target pedestrian features encoded for each of a plurality of the observed frames, the respective sets of road user features encoded for each of the plurality of the observed frames, the set of relative importance weights generated for each of the plurality of the observed frames, and the respective clusters, a future action of the target pedestrian; and automatically controlling an action of an autonomous vehicle based on the predicted future action of the target pedestrian. 2 . The method of claim 1 , wherein the relative location importance for each road user is based on a direction of movement of the road user relative to the target pedestrian. 3 . The method of claim 2 , wherein the relative location importance for each road user is given a greater importance if the road user is moving towards the target pedestrian than if the road user is moving away from the target pedestrian. 4 . The method of claim 2 , wherein the relative location importance for each road user is further based on a travel distance of the road user along a road relative to a position of the target pedestrian. 5 . The method of claim 2 , wherein relative location importance for each road user is based on a distance of the road user from a reference line that extends from the position of the target pedestrian and is perpendicular to a roadway direction of travel. 6 . The method of claim 1 , wherein, for each road user, the distance between the road user and the target pedestrian is a Euclidian distance. 7 . The method of claim 1 , wherein for each observed frame in the sequence of observed frames: encoding the set of target pedestrian features for the target pedestrian and a respective set of road user features for each of the plurality of road users is based on the spatial information included in multiple observed frames in the sequence including the observed frame; and generating the set of relative importance weights for each road user is based on the spatial information included in multiple observed frames in the sequence including the observed frame. 8 . The method of claim 1 , wherein a respective spatial graph is generated for each of the observed frames, wherein for each observed frame: the respective spatial graph has a target pedestrian node representing the target pedestrian, and a plurality of road user nodes each representing a respective one of the plurality of road users, the respective spatial graph being defined by: (i) a feature matrix that includes the encoded target pedestrian features as features of the target pedestrian node, and includes the set of road user features encoded for the respective road users as features of the respective road user nodes; and (ii) an adjacency matrix that specifies: (a) respective weighted connecting edges between the target pedestrian node and each of the respective road user nodes corresponding to the set of relative importance weights generated for the observed frames; and (b) connecting edges between each of the road user nodes that are included in a respective cluster. 9 . The method of claim 8 , wherein predicting the future action of the target pedestrian is performed using a spatiotemporal convolutional graph neural network that receives the spatial graphs generated for the observed frames. 10 . The method of claim 1 , wherein the predicted pedestrian action is one of the pedestrian will cross in front of the autonomous vehicle or the pedestrian will not cross in front of the autonomous vehicle. 11 . The method of claim 1 , wherein for each observed frame in the sequence of observed frames: the set respective set of road user features encoded for each of the plurality of road users includes one or more of: a type of the road user; a location of the road user relative to the target pedestrian, a size of the road user, a velocity of the road user, and a direction of movement of the road user. 12 . A processing system comprising: one or more processor systems; one or more non-transitory memories storing instructions which when executed by the one or more processor systems cause the one or more processing systems to perform a method for predicting a pedestrian action comprising: receiving a temporal sequence of observed frames, each observed frame including spatial information for a target pedestrian and a plurality of road users; for each observed frame in the sequence of observed frames: encoding, based at least on the spatial information included in the observed frame, a set of target pedestrian features for the target pedestrian and a respective set of road user features for each of the plurality of road users; generating, based at least on the spatial information included in the observed frame, a set of relative importance weights that includes, for each of the road users, a respective relative importance weight that indicates a relative importance of the road user to the target pedestrian, the respective relative importance weight for each road user being based both on a distance between the road user and the target pedestrian and a relative location importance of the road user to target pedestrian; clustering, based on the spatial information included in multiple observed frames in the sequence including the observed frame, groups of road users from the plurality of road users into respective clusters based on behavioral similarities, wherein each of the respective clusters identifies a group of similar behaved road users; predicting, based on the set of target pedestrian features encoded for each of a plurality of the observed frames, the respective sets of road user features encoded for each of the plurality of the observed frames, the set of relative importance weights generated for each of the plurality of the observed frames, and the respective clusters, a future action of the target pedestrian; and automatically controlling an action of an autonomous vehicle based on the predicted future action of the target pedestrian. 13 . The system of claim 12 , wherein the relative location importance for each road user is based on a direction of movement of the road user relative to the target pedestrian, and the relative location importance for each road user is given a greater importanc

Assignees

Inventors

Classifications

  • based on graphs, e.g. graph cuts or spectral clustering · CPC title

  • Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation · CPC title

  • Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads · CPC title

  • Relationship among other objects, e.g. converging dynamic objects · CPC title

  • Direction of movement, e.g. backwards · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12565240B2 cover?
The present disclosure relates to methods and systems for spatiotemporal graph modelling of road users in observed frames of an environment in which an autonomous vehicle operates (i.e. a traffic scene), clustering of the road users into categories, and providing the spatiotemporal graph to a trained graphical convolutional neural network (GNN) to predict a future pedestrian action. The future …
Who is the assignee on this patent?
Malekmohammadi Saber, Yau Tiffany Yee Kay, Rasouli Amir, and 3 more
What technology area does this patent fall under?
Primary CPC classification B60W60/0027. Mapped technology areas include Operations & Transport.
When was this patent published?
Publication date Tue Mar 03 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).