Gradient split system for rich human analysis
US-2024233314-A1 · Jul 11, 2024 · US
US12528512B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12528512-B2 |
| Application number | US-202318326780-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 31, 2023 |
| Priority date | May 31, 2023 |
| Publication date | Jan 20, 2026 |
| Grant date | Jan 20, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Autonomous vehicles utilize perception and understanding of road users and road objects to predict behaviors of the road users and road objects, and to plan a trajectory for the vehicle. Understanding subtypes and attributes of vulnerable road users may help autonomous vehicles better predict behaviors of and react to vulnerable road users. To offer additional understanding capabilities, an additional understanding model is added to the perception and understanding pipeline to improve classification of vulnerable road users and extraction of attributes of the vulnerable road users. The exemplary architectures of the understanding model balance recall and precision performance metrics and computational complexity.
Opening claim text (preview).
What is claimed is: 1. A vehicle comprising: sensors; one or more processors; and one or more storage media encoding instructions executable by the one or more processors to implement an understanding part, wherein the understanding part includes: a main understanding model to classify a tracked object into at least one of: one or more road user classifications and a vulnerable road user classification; and a sub-model to output inferences for a plurality of task groups, the sub-model including: a shared backbone to receive and process sensor data generated from the sensors corresponding to tracked objects having the vulnerable road user classification; temporal networks dedicated to respective task groups; and heads to output inferences for the respective task groups, wherein the inferences include one or more vulnerable road user subtype classifications and one or more vulnerable road user attributes, wherein the instructions are executable by the one or more processors to implement a planning part configured to plan a trajectory of the vehicle based on the inferences. 2. The vehicle of claim 1 , wherein the task groups comprise: a first task group to extract vulnerable road user subtype classifications, extract whether a pedestrian has fallen, and extract gaze attributes. 3. The vehicle of claim 1 , wherein the task groups comprise: a second task group to extract human controlling traffic subtype classifications and/or attributes, and extract human controlling traffic gesture attributes. 4. The vehicle of claim 1 , wherein the task groups comprise: a third task group to extract vulnerable road user intent attributes. 5. The vehicle of claim 1 , wherein the shared backbone comprises a deep neural network. 6. The vehicle of claim 1 , wherein the temporal networks comprise long short-term memory neural networks. 7. The vehicle of claim 1 , wherein the temporal networks comprise a first temporal network having a first sequence length, and a second temporal network having a second sequence length that is different from the first sequence length. 8. The vehicle of claim 7 , wherein the first temporal network is dedicated to a first task group to extract vulnerable road user subtype classifications, and the second temporal network is dedicated to a second task group to extract human controlling traffic gesture attributes. 9. The vehicle of claim 1 , wherein the temporal networks receive vectorized spatial maps corresponding to a sequence of image frames provided as the sensor data to the shared backbone. 10. The vehicle of claim 1 , wherein the shared backbone comprises: first layers to generate local spatial maps; and second layers downstream of first layers to generate global spatial maps. 11. The vehicle of claim 10 , wherein the sub-model further includes one or more spatial networks dedicated to one or more respective task groups to receive the local spatial maps from first layers of the shared backbone and to generate task group specific spatial maps. 12. The vehicle of claim 10 , wherein at least one or more of the temporal networks receive task group specific spatial maps and global spatial maps corresponding to a sequence of image frames provided as the sensor data to the shared backbone. 13. The vehicle of claim 1 , wherein the heads comprise fully connected neural network layers for the respective task groups. 14. The vehicle of claim 1 , wherein the understanding part is configured to correct confidence estimates of inferences output by the heads. 15. A computer-implemented method for understanding vulnerable road users and controlling a vehicle based on the understanding, the method comprising: determining, by a main understanding model, that a tracked object has a vulnerable road user classification; providing sensor data corresponding to the tracked object having the vulnerable road user classification to a sub-model; determining, by the sub-model, a plurality of inferences based on the sensor data, wherein: determining the plurality of inferences comprises: processing the sensor data using a shared backbone; processing global spatial maps by temporal networks dedicated to respective task groups; and generating inferences based on respective outputs of the temporal networks by heads that are dedicated to the respective tasks groups; and the inferences include one or more vulnerable road user subtype classifications and one or more vulnerable road user attributes; and planning a trajectory of the vehicle based on the inferences. 16. The computer-implemented method of claim 15 , wherein determining the plurality of inferences further comprises: processing local spatial maps by one or more spatial networks dedicated to at least one or more respective task groups; and generating task group specific spatial maps by the one or more spatial networks. 17. The computer-implemented method of claim 16 , wherein the temporal networks dedicated to respective task groups further process the task group specific spatial maps along with the global spatial maps. 18. The computer-implemented method of claim 15 , wherein the temporal networks are configured to process different number of global spatial maps as input. 19. One or more non-transient storage media encoding instructions executable by one or more processors to implement an understanding part, wherein the understanding part includes: a shared backbone to receive and process sensor data generated from sensors of a vehicle corresponding to tracked objects having a vulnerable road user classification; temporal networks dedicated to respective task groups; and heads to output inferences for the respective task groups, wherein the inferences include one or more vulnerable road user subtype classifications and one or more vulnerable road user attributes, wherein the instructions are executable by the one or more processors to implement a planning part configured to plan a trajectory of the vehicle based on the inferences. 20. The one or more non-transient storage media of claim 19 , wherein the temporal networks comprise a first temporal network having a first sequence length, and a second temporal network having a second sequence length that is different from the first sequence length.
using neural networks · CPC title
High definition maps · CPC title
Pedestrians · CPC title
Spatial relation or speed relative to objects · CPC title
Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.