Granular neural network architecture search over low-level primitives
US-2024428071-A1 · Dec 26, 2024 · US
US2025245516A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025245516-A1 |
| Application number | US-202418428515-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jan 31, 2024 |
| Priority date | Jan 31, 2024 |
| Publication date | Jul 31, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and systems for optimizing an action policy of an autonomous vehicle machine learning model. Images are generated corresponding to an environment about a vehicle. These images are passed through an image encoder to generate image-based embeddings of the current state of the vehicle. A text prompt representing a goal of the autonomous vehicle is passed through a text encoder to generate text-based embeddings of the goal. A similarity score is determined, representing a similarity between the image-based embeddings of the current state and the text-based embeddings of the goal. A reinforcement learning model for a closed-loop autonomous driving task is executed, with the similarity score used as the reward function. An action policy corresponding to a control of the vehicle is optimized based on the reward function.
Opening claim text (preview).
What is claimed is: 1 . A method for optimizing an action policy of a machine learning model of an autonomous vehicle, the method comprising: generating an image of an environment about an autonomous vehicle based on vehicle sensor data representing a current state of the autonomous vehicle; passing the generated image through an image encoder to generate image-based embeddings of the current state of the autonomous vehicle; receiving a text prompt representing a goal of the autonomous vehicle; passing the text prompt through a text encoder to generate text-based embeddings of the goal; determining a similarity score representing a similarity between the image-based embeddings of the current state and the text-based embeddings of the goal; executing a reinforcement learning model for a closed-loop autonomous driving task, wherein the similarity score is utilized as a reward in the reinforcement learning model; and optimizing an action policy of the reinforcement learning model based on the similarity score utilized as the reward, wherein the action policy is associated with a control command of the autonomous vehicle. 2 . The method of claim 1 , further comprising: executing a foundation model to perform the determining of the similarity score. 3 . The method of claim 2 , wherein the similarity score is determined as follows: r = 1 - FM state ( state description ) · FM goal ( goal description ) ❘ "\[LeftBracketingBar]" FM goal ( goal description ) ❘ "\[RightBracketingBar]" · ❘ "\[LeftBracketingBar]" FM state ( state description ) ❘ "\[RightBracketingBar]" wherein r represents the reward utilized in the reinforcement learning model, FM state represents the image-based embeddings of the current state of the autonomous vehicle, and FM goal represents the text-based embeddings of the goal. 4 . The method of claim 1 , wherein the text prompt is a human-crafted text prompt not generated by a machine learning model. 5 . The method of claim 1 , wherein the determining of the similarity score includes deriving an inverse of the similarity between the image-based embeddings of the current state and the text-based embeddings of the goal. 6 . The method of claim 1 , wherein the image encoder is part of a vision-language model (VLM) configured to generate a vector representing the generated image in a learned embedding space. 7 . The method of claim 6 , wherein the text encoder is part of a large language model (LLM) configured to generate a vector representing the goal in a learned embedding space. 8 . A system for optimizing an action policy of a machine learning model of an autonomous vehicle, the system comprising: one or more image sensors mounted to an autonomous vehicle and configured to generate images external to the autonomous vehicle representing a current state of the autonomous vehicle; and one or more processors communicatively coupled to the one or more images sensors, the one or more processors programmed to: receive the generated images from the one or more image sensors, execute an image encoder on the generated images to generate image-based embeddings of the current state of the vehicle, receive a text prompt representing a goal of the autonomous vehicle, execute a text encoder on the text prompt to generate text-based embeddings of the goal, determine a similarity score representing a similarity between the image-based embeddings of the current state and the text-based embeddings of the goal, execute a reinforcement learning model for a closed-loop autonomous driving task, wherein the similarity score is utilized as a reward in the reinforcement learning model, and optimize an action policy of the reinforcement learning model based on the similarity score utilized as the reward, wherein the action policy is associated with a control command of the autonomous vehicle. 9 . The system of claim 8 , wherein the one or more processors are further programmed to: execute a foundation model to perform the determining of the similarity score. 10 . The system of claim 9 , wherein the similarity score is determined as follows: r = 1 - FM state ( state description ) · FM goal ( goal description ) ❘ "\[LeftBracketingBar]" FM goal (
Mathematical models, e.g. for simulation · CPC title
Control system elements or transfer functions · CPC title
Details of control systems for road vehicle drive control not related to the control of a particular sub-unit {, e.g. process diagnostic or vehicle driver interfaces} · CPC title
Planning or execution of driving tasks · CPC title
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.