Conditional trajectory determination by a machine learned model
US-12311972-B2 · May 27, 2025 · US
US2024300542A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2024300542-A1 |
| Application number | US-202418600159-A |
| Country | US |
| Kind code | A1 |
| Filing date | Mar 8, 2024 |
| Priority date | Mar 8, 2023 |
| Publication date | Sep 12, 2024 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating trajectory predictions for one or more agents in an environment. In one aspect, a method comprises: obtaining scene context data characterizing a scene in an environment at a current time point and generating a respective predicted future trajectory for each of a plurality of agents in the scene at the current time point by sampling a sequence of discrete motion tokens that defines a joint future trajectory for the plurality of agents using a trajectory prediction neural network that is conditioned on the scene context data.
Opening claim text (preview).
1 . A method performed by one or more computers, the method comprising: obtaining scene context data characterizing a scene in an environment at a current time point; and generating a respective predicted future trajectory for each of a plurality of agents in the scene in the environment at the current time point, the generating comprising: sampling a sequence of discrete motion tokens that defines a joint future trajectory for the plurality of agents using a trajectory prediction neural network that is conditioned on the scene context data. 2 . The method of claim 1 , wherein: the scene context data comprises data generated from data captured by one or more sensors of an autonomous vehicle, and the plurality of agents are agents in a vicinity of the autonomous vehicle in the environment. 3 . The method of claim 2 , further comprising: controlling the autonomous vehicle based on the respective predicted future trajectories of the plurality of agents. 4 . The method of claim 1 , wherein: each respective future trajectory comprises a respective predicted agent state for the agent at each of a plurality of future time points, the sequence of discrete motion tokens comprises a respective discrete motion token at each of a plurality of time steps, and each time step corresponds to a respective one of the plurality of agents and a respective future time point and the discrete motion token at the time step defines the predicted agent state for the corresponding agent at the corresponding future time point. 5 . The method of claim 4 , wherein each discrete motion token is selected from a vocabulary of motion tokens that each correspond to a different delta to be applied to a preceding agent state, and wherein the discrete motion token at the time step specifies a delta to be applied to a preceding agent state of the corresponding agent at a preceding time point that immediately precedes the corresponding future time point to generate the predicted agent state for the corresponding agent at the corresponding future time point. 6 . The method of claim 4 , wherein each respective predicted agent state comprises a predicted two-dimensional waypoint location of the corresponding agent at the corresponding future time point, and wherein each discrete motion token specifies a respective delta value for each of the two dimensions. 7 . The method of claim 4 , wherein the trajectory prediction neural network comprises (i) a scene encoder neural network and (ii) a trajectory decoder neural network. 8 . The method of claim 7 , wherein sampling a sequence of discrete motion tokens comprises: processing the scene context data using the scene encoder neural network to generate a respective scene encoding for each of the plurality of agents; and sampling the sequence of discrete motion tokens by, for each particular time step of at least a subset of the plurality of time steps: processing an input sequence comprising the discrete motion tokens corresponding to future time points that precede a particular future time point that corresponds to the particular time step using the trajectory decoder neural network while conditioned on the respective scene encoding for the agent corresponding the particular time step to generate a score distribution over a vocabulary of discrete motion tokens; and sampling the discrete motion token for the particular time step from the score distribution over the vocabulary of discrete motion tokens. 9 . The method of claim 8 , wherein processing the scene context data using the scene encoder neural network to generate a respective scene encoding for each of the plurality of agents comprises: for each agent, extracting features from the scene context with respect to a frame of reference of the agent to generate agent-specific scene context data and processing the agent-specific scene context data using the scene encoder neural network to generate the respective scene encoding for the agent. 10 . The method of claim 8 , wherein the trajectory decoder neural network comprises one or more self-attention layers that perform self-attention over the input sequence and one or more cross-attention layers that perform cross-attention into the respective scene encoding for the agent corresponding to the particular time step. 11 . The method of claim 8 , wherein the trajectory decoder neural network implements temporally causal conditioning such that the discrete motion tokens at each particular time step is sampled conditioned only on discrete motion tokens corresponding to future time points that precede the future time point that corresponds to the particular time step and not any discrete motion tokens corresponding to future time points that are after the future time point that corresponds to the particular time step. 12 . The method of claim 1 , further comprising: sampling one or more additional sequences of discrete motion tokens that each define a respective additional joint future trajectory for the plurality of agents using the trajectory prediction neural network that is conditioned on the scene context data; and aggregating a plurality of joint future trajectories comprising the joint future trajectory and the additional joint future trajectories to generate (i) a plurality of predicted trajectory modes and (ii) a respective probability for each predicted trajectory mode. 13 . The method of claim 12 , wherein the plurality of joint future trajectories comprise a plurality of further joint future trajectories that are each defined by a respective further sequence of discrete motion tokens generated by a respective replica of the trajectory prediction neural network that is conditioned on the scene context data. 14 . The method of claim 1 , wherein the trajectory prediction neural network has been trained through imitation learning. 15 . The method of claim 1 , further comprising: receiving a planned future trajectory for a particular one of the plurality of agents; and determining a set of discrete motion tokens that represent the planned future trajectory, wherein sampling a sequence of discrete motion tokens that defines a joint future trajectory for the plurality of agents using a trajectory prediction neural network that is conditioned on the scene context data comprises fixing each discrete motion token in the sequence that correspond to the particular agent to be equal to a corresponding discrete motion token from the set of discrete motion tokens that represent the planned future trajectory. 16 . A system comprising: one or more computers; and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising: obtaining scene context data characterizing a scene in an environment at a current time point; and generating a respective predicted future trajectory for each of a plurality of agents in the scene in the environment at the current time point, the generating comprising: sampling a sequence of discrete motion tokens that defines a joint future trajectory for the plurality of agents using a trajectory prediction neural network that is conditioned on the scene context data. 17 . The system of claim 16 , wherein: the scene context data comprises data generated from data captured by one or more sensors of an autonomous vehicle, and the plurality of agents are agents in a vicinity of the autonomous vehicle in the environment. 18 . The system of claim 17 , the operations further comprising: co
using trajectory prediction for other traffic participants · CPC title
Predicting future conditions · CPC title
Direction of movement, e.g. backwards · CPC title
Indexing codes relating to the type of sensors based on the principle of their operation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.