Refinement training for machine-learned vehicle control model

US12515698B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12515698-B2
Application numberUS-202318522034-A
CountryUS
Kind codeB2
Filing dateNov 28, 2023
Priority dateNov 28, 2023
Publication dateJan 6, 2026
Grant dateJan 6, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A machine-learned model that uses sensor and/or perception data to directly determine controls for operating an autonomous vehicle may be trained by identifying a preferred trajectory between a human-driven and vehicle-controlled trajectory, and using a first loss determined between the vehicle-controlled trajectory and the path the autonomous vehicle ultimately ended up taking in a scenario and a second loss determined between the vehicle-controlled trajectory and the human-driven trajectory to refine the machine-learned model. The machine-learned model may additionally or alternatively be refined by a learned reward model constructed by replacing one or more output heads of the machine-learned model with a regression head that is trained using performance metrics determined for the vehicle-controlled trajectory.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system comprising: one or more processors; and non-transitory memory storing processor-executable instructions that, when executed by the one or more processors, cause the system to perform operations comprising: receiving a scenario indicating at least one of: environment state data and object detection data, a set of vehicle trajectories, a vehicle path that an autonomous vehicle executed based at least in part on the set of vehicle trajectories, and a driver trajectory executed by a human controlling a vehicle based at least in part on simulation data of the scenario, wherein a vehicle trajectory of the set of vehicle trajectories is generated at a discrete time interval by a machine-learned model as part of controlling the autonomous vehicle during the scenario; determining, between two trajectories associated with a scenario comprising a first vehicle trajectory of the set of vehicle trajectories and the driver trajectory, one of the two trajectories as a preferred trajectory and the other of the two trajectories as a rejected trajectory, wherein the first vehicle trajectory indicates predicted controls to operate the vehicle over a time horizon during the scenario; determining a first intermediate loss based at least in part on: determining a first difference between the first vehicle trajectory and the vehicle path indicating a state of the vehicle according to output of the machine-learned model during the scenario; determining a second difference between the preferred trajectory and the first vehicle trajectory; altering, as a refined machine-learned model, one or more parameters of the machine-learned model based at least in part on the first intermediate loss; and transmitting the refined machine-learned model to the autonomous vehicle such that the autonomous vehicle is controlled based at least in part on an output of the refined machine-learned model. 2 . The system of claim 1 , wherein altering the one or more parameters of the machine-learned model comprises: determining, by a learned reward model, a reward based at least in part on the first difference and the preferred trajectory and a demerit based at least in part on the first difference and the rejected trajectory instead of determining the first intermediate loss; and altering, by reinforcement learning based at least in part on the reward and the demerit, the one or more parameters to increase a likelihood that the machine-learned model will generate the preferred trajectory and decrease the likelihood that the machine-learned model will generate the rejected trajectory. 3 . The system of claim 2 , wherein training the learned reward model comprises: replacing, as the learned reward model, at least one of an output head or an intermediate output head of the machine-learned model with a single output head that outputs a regressed value indicating a value that is used as the reward or the demerit; generating, by the learned reward model and based at least in part on at least one of the first vehicle trajectory or the vehicle path, an estimated reward or demerit; determining, by a ruleset and based at least in part on at least one of the first vehicle trajectory or the vehicle path, a performance metric indicating a level of at least one of performance, comfort, or safety associated with at least one of the first vehicle trajectory or the vehicle path; and altering a parameter of the learned reward model, including the single output head, based at least in part on a difference between the performance metric and the estimated reward or demerit. 4 . The system of claim 1 , wherein: determining the first intermediate loss comprises determining a third difference between the first difference and the second difference; altering the one or more parameters of the machine-learned model is based at least in part on determining a trajectory loss using the first intermediate loss and a second intermediate loss such that the trajectory loss is reduced; determining the second intermediate loss comprises: determining a fourth difference between a previous vehicle trajectory and a previous vehicle path, wherein the previous vehicle trajectory is generated by a previous version of the machine-learned model for the scenario and the previous vehicle path indicates a state of the vehicle according to output of the previous version of the machine-learned model during the scenario; determining a fifth difference between the previous vehicle trajectory and the preferred trajectory; and determining the second intermediate loss based at least in part on a sixth difference between the fourth difference and the fifth difference; and determining the trajectory loss comprises determining a seventh difference between the first intermediate loss and the second intermediate loss. 5 . The system of claim 4 , wherein: the trajectory loss is a first trajectory loss associated with the first vehicle trajectory and the first vehicle trajectory was generated at a first time in the scenario; altering the one or more parameters of the machine-learned model is based at least in part on determining a final loss using the first trajectory loss and a second trajectory loss; the second trajectory loss is determined based at least in part on a second vehicle trajectory of the set of vehicle trajectories associated with the scenario and the second vehicle trajectory was generated at a second time in the scenario later than the first time; and determining the final loss is based at least in part on: scaling the first trajectory loss using a first weight; and scaling the second trajectory loss using a second weight that is less than the first weight based at least in part on the second time being later than the first time. 6 . The system of claim 1 , wherein: the set of vehicle trajectories is part of a superset of vehicle trajectories associated with multiple scenarios; the operations further comprise determining that the set of vehicle trajectories are associated with a score that is below a score threshold; and determining the score is based at least in part on determining at least one of a safety cost, progress cost, or comfort cost associated with a set of vehicle trajectories. 7 . One or more non-transitory computer-readable media storing processor-executable instructions that, when executed by one or more processors, perform operations comprising: receiving a scenario, a set of vehicle trajectories, a vehicle path that an autonomous vehicle executed based at least in part on the set of vehicle trajectories, and a driver trajectory executed by a human controlling a vehicle based at least in part on simulation data of the scenario, wherein a vehicle trajectory of the set of vehicle trajectories is generated by a machine-learned model as part of controlling the autonomous vehicle during the scenario; determining, between two trajectories associated with a scenario comprising a first vehicle trajectory of the set of vehicle trajectories and the driver trajectory, one of the two trajectories as a preferred trajectory and the other of the two trajectories as a rejected trajectory, wherein the first vehicle trajectory indicates predicted controls to operate the vehicle over a time horizon during the scenario; determining a first intermediate loss based at least in part on: determining a first difference between the first vehicle trajectory and the vehicle path; determining a second difference between the preferred trajectory and the first vehicle trajectory; and altering, as a refined machine-learned model, one or more parameters of the machine-learned model based at least in part on the first intermediate loss. 8 . The one or more non-t

Assignees

Inventors

Classifications

  • G06N20/00Primary

    Machine learning · CPC title

  • B60W60/001Primary

    Planning or execution of driving tasks · CPC title

  • based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title

  • involving control alternatives for a single driving scenario, e.g. planning several paths to avoid obstacles · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12515698B2 cover?
A machine-learned model that uses sensor and/or perception data to directly determine controls for operating an autonomous vehicle may be trained by identifying a preferred trajectory between a human-driven and vehicle-controlled trajectory, and using a first loss determined between the vehicle-controlled trajectory and the path the autonomous vehicle ultimately ended up taking in a scenario an…
Who is the assignee on this patent?
Zoox Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 06 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).