What technology area does this patent fall under?

Primary CPC classification G06V10/82. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 20 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Perception and understanding of vulnerable road users

US12528512B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12528512-B2
Application number	US-202318326780-A
Country	US
Kind code	B2
Filing date	May 31, 2023
Priority date	May 31, 2023
Publication date	Jan 20, 2026
Grant date	Jan 20, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Autonomous vehicles utilize perception and understanding of road users and road objects to predict behaviors of the road users and road objects, and to plan a trajectory for the vehicle. Understanding subtypes and attributes of vulnerable road users may help autonomous vehicles better predict behaviors of and react to vulnerable road users. To offer additional understanding capabilities, an additional understanding model is added to the perception and understanding pipeline to improve classification of vulnerable road users and extraction of attributes of the vulnerable road users. The exemplary architectures of the understanding model balance recall and precision performance metrics and computational complexity.

First claim

Opening claim text (preview).

What is claimed is: 1. A vehicle comprising: sensors; one or more processors; and one or more storage media encoding instructions executable by the one or more processors to implement an understanding part, wherein the understanding part includes: a main understanding model to classify a tracked object into at least one of: one or more road user classifications and a vulnerable road user classification; and a sub-model to output inferences for a plurality of task groups, the sub-model including: a shared backbone to receive and process sensor data generated from the sensors corresponding to tracked objects having the vulnerable road user classification; temporal networks dedicated to respective task groups; and heads to output inferences for the respective task groups, wherein the inferences include one or more vulnerable road user subtype classifications and one or more vulnerable road user attributes, wherein the instructions are executable by the one or more processors to implement a planning part configured to plan a trajectory of the vehicle based on the inferences. 2. The vehicle of claim 1 , wherein the task groups comprise: a first task group to extract vulnerable road user subtype classifications, extract whether a pedestrian has fallen, and extract gaze attributes. 3. The vehicle of claim 1 , wherein the task groups comprise: a second task group to extract human controlling traffic subtype classifications and/or attributes, and extract human controlling traffic gesture attributes. 4. The vehicle of claim 1 , wherein the task groups comprise: a third task group to extract vulnerable road user intent attributes. 5. The vehicle of claim 1 , wherein the shared backbone comprises a deep neural network. 6. The vehicle of claim 1 , wherein the temporal networks comprise long short-term memory neural networks. 7. The vehicle of claim 1 , wherein the temporal networks comprise a first temporal network having a first sequence length, and a second temporal network having a second sequence length that is different from the first sequence length. 8. The vehicle of claim 7 , wherein the first temporal network is dedicated to a first task group to extract vulnerable road user subtype classifications, and the second temporal network is dedicated to a second task group to extract human controlling traffic gesture attributes. 9. The vehicle of claim 1 , wherein the temporal networks receive vectorized spatial maps corresponding to a sequence of image frames provided as the sensor data to the shared backbone. 10. The vehicle of claim 1 , wherein the shared backbone comprises: first layers to generate local spatial maps; and second layers downstream of first layers to generate global spatial maps. 11. The vehicle of claim 10 , wherein the sub-model further includes one or more spatial networks dedicated to one or more respective task groups to receive the local spatial maps from first layers of the shared backbone and to generate task group specific spatial maps. 12. The vehicle of claim 10 , wherein at least one or more of the temporal networks receive task group specific spatial maps and global spatial maps corresponding to a sequence of image frames provided as the sensor data to the shared backbone. 13. The vehicle of claim 1 , wherein the heads comprise fully connected neural network layers for the respective task groups. 14. The vehicle of claim 1 , wherein the understanding part is configured to correct confidence estimates of inferences output by the heads. 15. A computer-implemented method for understanding vulnerable road users and controlling a vehicle based on the understanding, the method comprising: determining, by a main understanding model, that a tracked object has a vulnerable road user classification; providing sensor data corresponding to the tracked object having the vulnerable road user classification to a sub-model; determining, by the sub-model, a plurality of inferences based on the sensor data, wherein: determining the plurality of inferences comprises: processing the sensor data using a shared backbone; processing global spatial maps by temporal networks dedicated to respective task groups; and generating inferences based on respective outputs of the temporal networks by heads that are dedicated to the respective tasks groups; and the inferences include one or more vulnerable road user subtype classifications and one or more vulnerable road user attributes; and planning a trajectory of the vehicle based on the inferences. 16. The computer-implemented method of claim 15 , wherein determining the plurality of inferences further comprises: processing local spatial maps by one or more spatial networks dedicated to at least one or more respective task groups; and generating task group specific spatial maps by the one or more spatial networks. 17. The computer-implemented method of claim 16 , wherein the temporal networks dedicated to respective task groups further process the task group specific spatial maps along with the global spatial maps. 18. The computer-implemented method of claim 15 , wherein the temporal networks are configured to process different number of global spatial maps as input. 19. One or more non-transient storage media encoding instructions executable by one or more processors to implement an understanding part, wherein the understanding part includes: a shared backbone to receive and process sensor data generated from sensors of a vehicle corresponding to tracked objects having a vulnerable road user classification; temporal networks dedicated to respective task groups; and heads to output inferences for the respective task groups, wherein the inferences include one or more vulnerable road user subtype classifications and one or more vulnerable road user attributes, wherein the instructions are executable by the one or more processors to implement a planning part configured to plan a trajectory of the vehicle based on the inferences. 20. The one or more non-transient storage media of claim 19 , wherein the temporal networks comprise a first temporal network having a first sequence length, and a second temporal network having a second sequence length that is different from the first sequence length.

Assignees

Gm Cruise Holdings Llc

Inventors

Classifications

G06V10/82Primary
using neural networks · CPC title
B60W2556/40
High definition maps · CPC title
B60W2554/4029
Pedestrians · CPC title
B60W2554/80
Spatial relation or speed relative to objects · CPC title
G06V40/20
Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title

Patent family

Related publications grouped by family.

View patent family 93653360

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12528512B2 cover?: Autonomous vehicles utilize perception and understanding of road users and road objects to predict behaviors of the road users and road objects, and to plan a trajectory for the vehicle. Understanding subtypes and attributes of vulnerable road users may help autonomous vehicles better predict behaviors of and react to vulnerable road users. To offer additional understanding capabilities, an add…
Who is the assignee on this patent?: Gm Cruise Holdings Llc
What technology area does this patent fall under?: Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 20 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Gradient split system for rich human analysis

Systems and methods for determining the presence of traffic control personnel and traffic control signage

Image classification utilizing semantic relationships in a classification hierarchy

Frequently asked questions