What technology area does this patent fall under?

Primary CPC classification G06N3/0455. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 16 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Trajectory prediction using efficient attention neural networks

US12497079B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12497079-B2
Application number	US-202318335915-A
Country	US
Kind code	B2
Filing date	Jun 15, 2023
Priority date	Jun 15, 2022
Publication date	Dec 16, 2025
Grant date	Dec 16, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus for generating trajectory predictions for one or more target agents. In one aspect, a system comprises one or more computers configured to obtain scene context data characterizing a scene in an environment at a current time point, where the scene includes multiple agents that include a target agent and one or more context agents, and the scene context data includes respective context data for each of multiple different modalities of context data. The one or more computers then generate an encoded representation of the scene in the environment that includes one or more embeddings and process the encoded representation of the scene context data using a decoder neural network to generate a trajectory prediction output for the target agent that predicts a future trajectory of the target after the current time point.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method performed by one or more computers, the method comprising: obtaining scene context data characterizing a scene in an environment at a current time point, wherein the scene includes a plurality of agents comprising a target agent and one or more context agents, and the scene context data comprises respective context data for each of multiple different modalities of context data, wherein the scene context data comprises data generated from data captured by one or more sensors of an autonomous vehicle, and wherein the target agent in the set is an agent in a vicinity of the autonomous vehicle in the environment; generating an encoded representation of the scene in the environment that comprises one or more embeddings, comprising: generating, for each of the multiple different modalities, a respective sequence of input elements for the modality from the context data for the modality; generating a combined sequence by concatenating the respective sequences of input elements for each of the different modalities; and processing the combined sequence using an attention-based encoder neural network to generate the one or more embeddings, wherein the attention-based encoder neural network comprises at least one cross-modal attention layer block that attends over input elements corresponding to each of the multiple different modalities; processing the encoded representation of the scene context data using a decoder neural network to generate a trajectory prediction output for the target agent that predicts a future trajectory of the target agent after the current time point; and providing at least one of the trajectory prediction output for the target agent or data derived from the trajectory prediction output to a planning system to control navigation of the autonomous vehicle. 2 . The method of claim 1 , wherein the trajectory prediction output defines a probability distribution over possible future trajectories of the target agent after the current time point. 3 . The method of claim 1 , wherein the trajectory prediction output is generated on-board the autonomous vehicle. 4 . The method of claim 1 , wherein the scene context data comprises target agent history context data characterizing current and previous states of the target agent. 5 . The method of claim 1 , wherein the scene context data comprises context agent history context data characterizing current and previous states of each of the one or more context agents. 6 . The method of claim 1 , wherein the scene context data comprises road graph context data characterizing road features in the scene. 7 . The method of claim 1 , wherein the scene context data comprises traffic signal context data characterizing at least respective current states of one or more traffic signals in the scene. 8 . The method of claim 1 , wherein generating, for each of the multiple different modalities, a respective sequence of input elements for the modality from the context data for the modality comprises, for each of the modalities: generating an initial sequence of input elements for the modality from the context data for the modality; and processing the initial sequence using an attention neural network that is specific to the modality to generate the sequence of input elements. 9 . The method of claim 1 , wherein generating, for each of the multiple different modalities, a respective sequence of input elements for the modality from the context data for the modality comprises, for each of the modalities: projecting the context data for the modality into a sequence of input elements that each have a dimensionality that is shared across the modalities. 10 . The method of claim 9 , wherein projecting the context data for the modality into a sequence of input elements that each have a dimensionality that is shared across the modalities comprises: projecting the context data for the modality into a sequence of input elements that each have a dimensionality that is shared across the modalities without applying attention over the context data. 11 . The method of claim 9 , wherein generating, for each of the multiple different modalities, a respective sequence of input elements for the modality from the context data for the modality comprises, for each of the modalities: applying positional embedding to each of the input elements. 12 . The method of claim 11 , wherein the context data for each modality is represented as a tensor having a feature dimension, and wherein projecting the context data comprises projecting the feature dimension to have the shared dimensionality. 13 . The method of claim 1 , wherein each input element corresponds to a respective time point along a temporal dimension, and wherein the attention-based encoder neural network comprises one or more temporal cross-modal attention layer blocks that self-attend over input elements corresponding to each of the multiple different modalities along the temporal dimension. 14 . The method of claim 13 , wherein, for each index along the temporal dimension, each temporal cross-modal attention layer block updates the input elements having the index by attending over the input elements having the index. 15 . The method of claim 14 , wherein each input element corresponds to a respective spatial entity along a spatial dimension and wherein the attention-based encoder neural network comprises one or more spatial attention layer blocks that self-attend over input elements along the spatial dimension. 16 . The method of claim 15 , wherein, for each index along the spatial dimension, each spatial cross-modal attention layer block updates the input elements having the index by attending over the input elements having the index. 17 . The method of claim 13 , wherein the encoded representation of the scene in the environment that comprises a respective embedding for each input element in the combined sequence. 18 . The method of claim 1 , wherein the attention-based encoder neural network also receives as input a set of learned queries and comprises: (i) one or more self-attention layer blocks that update the learned queries by applying self-attention over the learned queries, and (ii) one or more cross-attention cross-modal layer blocks that update the learned queries by applying cross-attention between the learned queries and the combined sequence. 19 . The method of claim 18 , wherein the encoded representation of the scene in the environment comprises a respective embedding for each learned query. 20 . The method of claim 1 , further comprising: controlling, by the planning system of the autonomous vehicle, the autonomous vehicle to navigate in the environment based on (i) the trajectory prediction output for the target agent, (ii) data derived from the trajectory prediction output, or (iii) both. 21 . A system comprising: one or more computers; and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising: obtaining scene context data characterizing a scene in an environment at a current time point, wherein the scene includes a plurality of agents comprising a target agent and one or more context agents, and the scene context data comprises respective context data for each of multiple different modalities of context data, wherein the scene context data comprises data generated from data captured by one or more sensors of an a

Assignees

Waymo Llc

Inventors

Classifications

B60W40/04
Traffic conditions · CPC title
G06N3/0455Primary
Auto-encoder networks; Encoder-decoder networks · CPC title
B60W2556/10
Historical data · CPC title
B60W40/06
Road conditions · CPC title
B60W2554/4041
Position · CPC title

Patent family

Related publications grouped by family.

View patent family 89170232

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12497079B2 cover?: Methods, systems, and apparatus for generating trajectory predictions for one or more target agents. In one aspect, a system comprises one or more computers configured to obtain scene context data characterizing a scene in an environment at a current time point, where the scene includes multiple agents that include a target agent and one or more context agents, and the scene context data includ…
Who is the assignee on this patent?: Waymo Llc
What technology area does this patent fall under?: Primary CPC classification G06N3/0455. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 16 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).