What technology area does this patent fall under?

Primary CPC classification G05B13/027. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 09 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Interactive autonomous vehicle agent

US10254759B1 · US · B1

Patent metadata
Field	Value
Publication number	US-10254759-B1
Application number	US-201715704969-A
Country	US
Kind code	B1
Filing date	Sep 14, 2017
Priority date	Sep 14, 2017
Publication date	Apr 9, 2019
Grant date	Apr 9, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing an interactive autonomous vehicle agent. One of the methods includes receiving a request to generate an experience tuple for a vehicle in a particular driving context. A predicted environment observation representing a predicted environment of the autonomous vehicle after the candidate action is taken by the autonomous vehicle in an initial environment is generated, including providing an initial environment observation and the candidate action as input to a vehicle behavior model neural network trained to generate predicted environment observations. An immediate quality value is generated from a context-specific quality model that generates immediate quality values that are specific to the particular driving context. An experience tuple comprising the initial environment observation, the candidate action, and the immediate quality value is generated and used as input to a reinforcement learning system for the autonomous vehicle.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving a request to generate an experience tuple for a vehicle in a particular driving context, the particular driving context being one of a plurality of driving contexts; receiving an initial environment observation representing an initial environment of the autonomous vehicle and a candidate action representing an action to be taken by the autonomous vehicle in the initial environment; generating a predicted environment observation representing a predicted environment of the autonomous vehicle after the candidate action is taken by the autonomous vehicle in the initial environment, including providing the initial environment observation and the candidate action as input to a vehicle behavior model neural network trained to generate predicted environment observations; generating an immediate quality value including providing the predicted environment of the autonomous vehicle and the candidate action as input to a context-specific quality model that generates immediate quality values that are specific to the particular driving context; generating an experience tuple comprising the initial environment observation, the candidate action, and the immediate quality value; and providing the experience tuple as input to a reinforcement learning system for the autonomous vehicle. 2. The method of claim 1 , further comprising: receiving, from the reinforcement learning system, a cumulative action value for the candidate action; computing updated weights for the reinforcement learning system using the cumulative action value and the immediate quality value; and updating the reinforcement learning system using the updated weights. 3. The method of claim 1 , further comprising: receiving, from the reinforcement learning system, a cumulative action value for the candidate action; ranking the candidate action with a plurality of other actions according to respective cumulative action values computed by the reinforcement learning system for the actions; determining that the candidate action is a highest-ranked action; and in response, selecting the candidate action. 4. The method of claim 1 , wherein the autonomous vehicle is a physical autonomous vehicle having an onboard policy engine implementing a policy generated by the reinforcement learning system. 5. The method of claim 1 , wherein the autonomous vehicle is a simulated autonomous vehicle. 6. The method of claim 1 , wherein the vehicle behavior model neural network generates predicted environment observations for all of the plurality of driving contexts. 7. The method of claim 6 , wherein the context-specific quality model is trained using training data from only the particular driving context of the plurality of driving contexts. 8. The method of claim 7 , wherein the plurality of driving contexts include one or more of a lane merging context, an intersection navigation context, a lane changing context, or a highway driving context. 9. The method of claim 1 , wherein the context-specific quality model uses a collision distance feature representing a sum of actions to be take by the autonomous vehicle and one or more other vehicles to cause a collision. 10. The method of claim 1 , wherein the context-specific quality model uses a safety rating feature that quantifies the safety of each particular candidate action and a comfort rating feature that quantifies the passenger comfort of each particular candidate action. 11. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to implement: a vehicle behavior model neural network configured to generate a predicted environment observation representing a predicted environment of an autonomous vehicle in a particular driving context after a candidate action is taken by the autonomous vehicle in an initial environment; a quality model engine configured to generate an immediate quality value by providing the predicted environment of the autonomous vehicle and the candidate action as input to a context-specific quality model that generates immediate quality values that are specific to the particular driving context; and a reinforcement learning system for the autonomous vehicle that is configured to receive an experience tuple comprising the initial environment observation, the candidate action, and the immediate quality value and to generate a cumulative action value for the candidate action when performed by the autonomous vehicle in the initial environment. 12. The system of claim 11 , further comprising: a training engine configured to receive, from the reinforcement learning system, a cumulative action value for the candidate action, to compute updated weights for the reinforcement learning system using the cumulative action value and the immediate quality value, and to update the reinforcement learning system using the updated weights. 13. The system of claim 11 , further comprising: a policy engine configured to receive, from the reinforcement learning system, a cumulative action value for the candidate action, to rank the candidate action with a plurality of other actions according to respective cumulative action values computed by the reinforcement learning system for the actions, to determine that the candidate action is a highest-ranked action, and to select the candidate action in response. 14. The system of claim 11 , further comprising a physical autonomous vehicle having an onboard policy engine implementing a policy generated by the reinforcement learning system. 15. The system of claim 11 , wherein the autonomous vehicle is a simulated autonomous vehicle. 16. The system of claim 11 , wherein the vehicle behavior model neural network is configured to generate predicted environment observations for all of the plurality of driving contexts. 17. The system of claim 16 , wherein the context-specific quality model is trained using training data from only the particular driving context of the plurality of driving contexts. 18. The system of claim 17 , wherein the plurality of driving contexts include one or more of a lane merging context, an intersection navigation context, a lane changing context, or a highway driving context. 19. The system of claim 11 , wherein the context-specific quality model is configured to use a collision distance feature representing a sum of actions to be take by the autonomous vehicle and one or more other vehicles to cause a collision. 20. The system of claim 11 , wherein the context-specific quality model is configured to use a safety rating feature that quantifies the safety of each particular candidate action and a comfort rating feature that quantifies the passenger comfort of each particular candidate action. 21. A computer program product, encoded on one or more non-transitory computer storage media, comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: receiving a request to generate an experience tuple for a vehicle in a particular driving context, the particular driving context being one of a plurality of driving contexts; receiving an initial environment observation representing an initial environment of the autonomous vehicle and a candidate action representing an action to be taken by the autonomous vehicle in the initial environment; generating a predicted en

Assignees

Waymo Llc

Inventors

Classifications

G06N3/045
Combinations of networks · CPC title
G06F30/27
using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model · CPC title
G06F30/20
Design optimisation, verification or simulation (optimisation, verification or simulation of circuit designs G06F30/30) · CPC title
G06N3/006
based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title
G05B13/027Primary
using neural networks only · CPC title

Patent family

Related publications grouped by family.

View patent family 65998203

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10254759B1 cover?: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing an interactive autonomous vehicle agent. One of the methods includes receiving a request to generate an experience tuple for a vehicle in a particular driving context. A predicted environment observation representing a predicted environment of the autonomous vehicle after the candida…
Who is the assignee on this patent?: Waymo Llc
What technology area does this patent fall under?: Primary CPC classification G05B13/027. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 09 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Training a self-driving vehicle

Machine learning navigational engine with imposed constraints

Reinforcement learning using advantage estimates

Training reinforcement learning neural networks

Continuous control with deep reinforcement learning

Frequently asked questions