Systems and methods for foundation models based reward design for autonomous driving

US2025245516A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025245516-A1
Application numberUS-202418428515-A
CountryUS
Kind codeA1
Filing dateJan 31, 2024
Priority dateJan 31, 2024
Publication dateJul 31, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and systems for optimizing an action policy of an autonomous vehicle machine learning model. Images are generated corresponding to an environment about a vehicle. These images are passed through an image encoder to generate image-based embeddings of the current state of the vehicle. A text prompt representing a goal of the autonomous vehicle is passed through a text encoder to generate text-based embeddings of the goal. A similarity score is determined, representing a similarity between the image-based embeddings of the current state and the text-based embeddings of the goal. A reinforcement learning model for a closed-loop autonomous driving task is executed, with the similarity score used as the reward function. An action policy corresponding to a control of the vehicle is optimized based on the reward function.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for optimizing an action policy of a machine learning model of an autonomous vehicle, the method comprising: generating an image of an environment about an autonomous vehicle based on vehicle sensor data representing a current state of the autonomous vehicle; passing the generated image through an image encoder to generate image-based embeddings of the current state of the autonomous vehicle; receiving a text prompt representing a goal of the autonomous vehicle; passing the text prompt through a text encoder to generate text-based embeddings of the goal; determining a similarity score representing a similarity between the image-based embeddings of the current state and the text-based embeddings of the goal; executing a reinforcement learning model for a closed-loop autonomous driving task, wherein the similarity score is utilized as a reward in the reinforcement learning model; and optimizing an action policy of the reinforcement learning model based on the similarity score utilized as the reward, wherein the action policy is associated with a control command of the autonomous vehicle. 2 . The method of claim 1 , further comprising: executing a foundation model to perform the determining of the similarity score. 3 . The method of claim 2 , wherein the similarity score is determined as follows: r = 1 - FM state ( state ⁢ description ) · FM goal ( goal ⁢ description ) ❘ "\[LeftBracketingBar]" FM goal ( goal ⁢ description ) ❘ "\[RightBracketingBar]" · ❘ "\[LeftBracketingBar]" FM state ( state ⁢ description ) ❘ "\[RightBracketingBar]" wherein r represents the reward utilized in the reinforcement learning model, FM state represents the image-based embeddings of the current state of the autonomous vehicle, and FM goal represents the text-based embeddings of the goal. 4 . The method of claim 1 , wherein the text prompt is a human-crafted text prompt not generated by a machine learning model. 5 . The method of claim 1 , wherein the determining of the similarity score includes deriving an inverse of the similarity between the image-based embeddings of the current state and the text-based embeddings of the goal. 6 . The method of claim 1 , wherein the image encoder is part of a vision-language model (VLM) configured to generate a vector representing the generated image in a learned embedding space. 7 . The method of claim 6 , wherein the text encoder is part of a large language model (LLM) configured to generate a vector representing the goal in a learned embedding space. 8 . A system for optimizing an action policy of a machine learning model of an autonomous vehicle, the system comprising: one or more image sensors mounted to an autonomous vehicle and configured to generate images external to the autonomous vehicle representing a current state of the autonomous vehicle; and one or more processors communicatively coupled to the one or more images sensors, the one or more processors programmed to: receive the generated images from the one or more image sensors, execute an image encoder on the generated images to generate image-based embeddings of the current state of the vehicle, receive a text prompt representing a goal of the autonomous vehicle, execute a text encoder on the text prompt to generate text-based embeddings of the goal, determine a similarity score representing a similarity between the image-based embeddings of the current state and the text-based embeddings of the goal, execute a reinforcement learning model for a closed-loop autonomous driving task, wherein the similarity score is utilized as a reward in the reinforcement learning model, and optimize an action policy of the reinforcement learning model based on the similarity score utilized as the reward, wherein the action policy is associated with a control command of the autonomous vehicle. 9 . The system of claim 8 , wherein the one or more processors are further programmed to: execute a foundation model to perform the determining of the similarity score. 10 . The system of claim 9 , wherein the similarity score is determined as follows: r = 1 - FM state ( state ⁢ description ) · FM goal ( goal ⁢ description ) ❘ "\[LeftBracketingBar]" FM goal (

Assignees

Inventors

Classifications

  • Mathematical models, e.g. for simulation · CPC title

  • Control system elements or transfer functions · CPC title

  • Details of control systems for road vehicle drive control not related to the control of a particular sub-unit {, e.g. process diagnostic or vehicle driver interfaces} · CPC title

  • Planning or execution of driving tasks · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025245516A1 cover?
Methods and systems for optimizing an action policy of an autonomous vehicle machine learning model. Images are generated corresponding to an environment about a vehicle. These images are passed through an image encoder to generate image-based embeddings of the current state of the vehicle. A text prompt representing a goal of the autonomous vehicle is passed through a text encoder to generate …
Who is the assignee on this patent?
Bosch Gmbh Robert
What technology area does this patent fall under?
Primary CPC classification G06N3/092. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jul 31 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).