Training a self-driving vehicle
US-2018237005-A1 · Aug 23, 2018 · US
US10254759B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-10254759-B1 |
| Application number | US-201715704969-A |
| Country | US |
| Kind code | B1 |
| Filing date | Sep 14, 2017 |
| Priority date | Sep 14, 2017 |
| Publication date | Apr 9, 2019 |
| Grant date | Apr 9, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing an interactive autonomous vehicle agent. One of the methods includes receiving a request to generate an experience tuple for a vehicle in a particular driving context. A predicted environment observation representing a predicted environment of the autonomous vehicle after the candidate action is taken by the autonomous vehicle in an initial environment is generated, including providing an initial environment observation and the candidate action as input to a vehicle behavior model neural network trained to generate predicted environment observations. An immediate quality value is generated from a context-specific quality model that generates immediate quality values that are specific to the particular driving context. An experience tuple comprising the initial environment observation, the candidate action, and the immediate quality value is generated and used as input to a reinforcement learning system for the autonomous vehicle.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: receiving a request to generate an experience tuple for a vehicle in a particular driving context, the particular driving context being one of a plurality of driving contexts; receiving an initial environment observation representing an initial environment of the autonomous vehicle and a candidate action representing an action to be taken by the autonomous vehicle in the initial environment; generating a predicted environment observation representing a predicted environment of the autonomous vehicle after the candidate action is taken by the autonomous vehicle in the initial environment, including providing the initial environment observation and the candidate action as input to a vehicle behavior model neural network trained to generate predicted environment observations; generating an immediate quality value including providing the predicted environment of the autonomous vehicle and the candidate action as input to a context-specific quality model that generates immediate quality values that are specific to the particular driving context; generating an experience tuple comprising the initial environment observation, the candidate action, and the immediate quality value; and providing the experience tuple as input to a reinforcement learning system for the autonomous vehicle. 2. The method of claim 1 , further comprising: receiving, from the reinforcement learning system, a cumulative action value for the candidate action; computing updated weights for the reinforcement learning system using the cumulative action value and the immediate quality value; and updating the reinforcement learning system using the updated weights. 3. The method of claim 1 , further comprising: receiving, from the reinforcement learning system, a cumulative action value for the candidate action; ranking the candidate action with a plurality of other actions according to respective cumulative action values computed by the reinforcement learning system for the actions; determining that the candidate action is a highest-ranked action; and in response, selecting the candidate action. 4. The method of claim 1 , wherein the autonomous vehicle is a physical autonomous vehicle having an onboard policy engine implementing a policy generated by the reinforcement learning system. 5. The method of claim 1 , wherein the autonomous vehicle is a simulated autonomous vehicle. 6. The method of claim 1 , wherein the vehicle behavior model neural network generates predicted environment observations for all of the plurality of driving contexts. 7. The method of claim 6 , wherein the context-specific quality model is trained using training data from only the particular driving context of the plurality of driving contexts. 8. The method of claim 7 , wherein the plurality of driving contexts include one or more of a lane merging context, an intersection navigation context, a lane changing context, or a highway driving context. 9. The method of claim 1 , wherein the context-specific quality model uses a collision distance feature representing a sum of actions to be take by the autonomous vehicle and one or more other vehicles to cause a collision. 10. The method of claim 1 , wherein the context-specific quality model uses a safety rating feature that quantifies the safety of each particular candidate action and a comfort rating feature that quantifies the passenger comfort of each particular candidate action. 11. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to implement: a vehicle behavior model neural network configured to generate a predicted environment observation representing a predicted environment of an autonomous vehicle in a particular driving context after a candidate action is taken by the autonomous vehicle in an initial environment; a quality model engine configured to generate an immediate quality value by providing the predicted environment of the autonomous vehicle and the candidate action as input to a context-specific quality model that generates immediate quality values that are specific to the particular driving context; and a reinforcement learning system for the autonomous vehicle that is configured to receive an experience tuple comprising the initial environment observation, the candidate action, and the immediate quality value and to generate a cumulative action value for the candidate action when performed by the autonomous vehicle in the initial environment. 12. The system of claim 11 , further comprising: a training engine configured to receive, from the reinforcement learning system, a cumulative action value for the candidate action, to compute updated weights for the reinforcement learning system using the cumulative action value and the immediate quality value, and to update the reinforcement learning system using the updated weights. 13. The system of claim 11 , further comprising: a policy engine configured to receive, from the reinforcement learning system, a cumulative action value for the candidate action, to rank the candidate action with a plurality of other actions according to respective cumulative action values computed by the reinforcement learning system for the actions, to determine that the candidate action is a highest-ranked action, and to select the candidate action in response. 14. The system of claim 11 , further comprising a physical autonomous vehicle having an onboard policy engine implementing a policy generated by the reinforcement learning system. 15. The system of claim 11 , wherein the autonomous vehicle is a simulated autonomous vehicle. 16. The system of claim 11 , wherein the vehicle behavior model neural network is configured to generate predicted environment observations for all of the plurality of driving contexts. 17. The system of claim 16 , wherein the context-specific quality model is trained using training data from only the particular driving context of the plurality of driving contexts. 18. The system of claim 17 , wherein the plurality of driving contexts include one or more of a lane merging context, an intersection navigation context, a lane changing context, or a highway driving context. 19. The system of claim 11 , wherein the context-specific quality model is configured to use a collision distance feature representing a sum of actions to be take by the autonomous vehicle and one or more other vehicles to cause a collision. 20. The system of claim 11 , wherein the context-specific quality model is configured to use a safety rating feature that quantifies the safety of each particular candidate action and a comfort rating feature that quantifies the passenger comfort of each particular candidate action. 21. A computer program product, encoded on one or more non-transitory computer storage media, comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: receiving a request to generate an experience tuple for a vehicle in a particular driving context, the particular driving context being one of a plurality of driving contexts; receiving an initial environment observation representing an initial environment of the autonomous vehicle and a candidate action representing an action to be taken by the autonomous vehicle in the initial environment; generating a predicted en
Combinations of networks · CPC title
using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model · CPC title
Design optimisation, verification or simulation (optimisation, verification or simulation of circuit designs G06F30/30) · CPC title
based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title
using neural networks only · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.