Method and apparatus for automatically generated curriculum sequence based reinforcement learning for autonomous vehicles
US-2019278282-A1 · Sep 12, 2019 · US
US11745746B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11745746-B2 |
| Application number | US-202017136220-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 29, 2020 |
| Priority date | Jan 9, 2020 |
| Publication date | Sep 5, 2023 |
| Grant date | Sep 5, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for generating vehicle controlling data includes executing an operating process that operates an electronic device, executing an obtaining process that obtains a state of a vehicle, executing a reward calculation process that assigns a reward based on the state of the vehicle, and an updating process that updates relationship specifying data. The reward calculation process includes a changing process that changes a reward assigned when an area variable equals a second value and a property of the vehicle is a predetermined property from a reward assigned when the area variable equals a first value and the property of the vehicle is the predetermined property.
Opening claim text (preview).
What is claimed is: 1. A method for generating vehicle controlling data, the method, comprising: when relationship specifying data that specifies a relationship between a state of a vehicle and an action variable, which is a variable related to operation of an electronic device mounted on the vehicle, is stored in a storage device, executing an operating process that operates the electronic device with processing circuitry; executing an obtaining process that obtains an area variable, which is a variable indicating an area in which the vehicle is located, a state of the vehicle based on a detection value of a sensor with the processing circuitry, and includes a value distinguishing between areas divided based on an average vehicle speed; executing a reward calculation process that assigns a reward based on the state of the vehicle obtained by the obtaining process with the processing circuitry, the reward being larger when a property of the vehicle meets a predetermined criterion and relatively smaller when the property of the vehicle does not meet the predetermined criterion; and executing an updating process that uses the state of the vehicle obtained by the obtaining process, a value of the action variable used for operation of the electronic device, and the reward corresponding to operation of the electronic device as inputs to a predetermined update mapping to update the relationship specifying data with the processing circuitry, wherein the update mapping is configured to output the relationship specifying data that is updated to increase an expected return of the reward when the electronic device is operated in accordance with the relationship specifying data, and the reward calculation process includes a changing process that changes a reward assigned when the area variable equals a second value indicating that the average vehicle speed is high and the property of the vehicle is a predetermined property from a reward assigned when the area variable equals a first value indicating that the average vehicle speed is low and the property of the vehicle is the predetermined property. 2. The method according to claim 1 , wherein the predetermined criterion includes a criterion related to acceleration response and a criterion related to energy usage efficiency, the reward calculation process includes a first process that assigns a greater reward when the criterion related to the acceleration response is met than when not met and a second process that assigns a greater reward when the criterion related to the energy usage efficiency is met than when not met, and the changing process includes a process that changes at least one of the first process or the second process so that an increase in the energy usage efficiency results in a greater reward in an area where the average vehicle speed is low as compared to an area where the average vehicle speed is high. 3. The method according to claim 1 , further comprising: executing a process that generates control mapping data that associates the state of the vehicle with a value of the action variable so that the state of the vehicle is used as an input to output the value of the action variable based on the relationship specifying data updated by the updating process with the processing circuitry. 4. A vehicle controller, comprising: a storage device for storing instructions; and the processing circuitry executes the instructions to cause the processing circuitry to execute the method according to claim 1 . 5. The vehicle controller of claim 4 , wherein: the processing circuitry further includes a first execution device that is mounted on a vehicle and a second execution device that is not mounted on the vehicle, the first execution device is configured to execute at least the obtaining process and the operating process, and the second execution device is configured to execute at least the updating process. 6. A learning device for a vehicle, the learning device, comprising: the second execution device according to claim 5 .
Details of control systems for road vehicle drive control not related to the control of a particular sub-unit {, e.g. process diagnostic or vehicle driver interfaces} · CPC title
Machine learning · CPC title
Registering or indicating driving, working, idle, or waiting time only (apparatus forming part of taximeters G07B13/00) · CPC title
Setting, resetting, calibration · CPC title
Longitudinal speed · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.