Who is the assignee on this patent?

Gm Global Tech Operations Llc, Univ Carnegie Mellon

What technology area does this patent fall under?

Primary CPC classification B60W60/001. Mapped technology areas include Operations & Transport.

When was this patent published?

Publication date Tue Aug 04 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method and apparatus for automatically generated curriculum sequence based reinforcement learning for autonomous vehicles

US10732639B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10732639-B2
Application number	US-201815915419-A
Country	US
Kind code	B2
Filing date	Mar 8, 2018
Priority date	Mar 8, 2018
Publication date	Aug 4, 2020
Grant date	Aug 4, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present application generally relates to a method and apparatus for generating an action policy for controlling an autonomous vehicle. In particular, the system performs a deep learning algorithm in order to determine the action policy and an automatically generated curriculum system to determine a number of increasingly difficult tasks in order to refine the action policy.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of training a vehicle control system comprising: determining a final task; receiving an input from a first vehicle sensor; determining a first task and a second task in response to the final task and the input wherein the second task has a higher difficulty that the first task and wherein the first task and the second task are determined in response to a curriculum learning system and wherein the determination of the first task and the second task are made according to an action value based incremental method for an automatically generated curriculum sequence; training an agent to perform the first task in order to generate an action policy to maximize a first reward, and to perform the second task in response to the action policy to maximize a second reward; and controlling a vehicle in performance of the second task in response to the action policy. 2. The method of claim 1 wherein the action policy is generated in accordance with a reinforced learning system. 3. The method of claim 1 wherein the curriculum learning system utilizes an armed bandit problem methodology. 4. The method of claim 1 wherein a curriculum sequence is determined in response to a first difficulty of the first task and a second difficulty of the second task and wherein the curriculum sequence is used to train an agent to generate an optimal action policy. 5. The method of claim 1 wherein a transition is the transitions are stored to a replay buffer. 6. The method of claim 1 wherein the second task is performed a plurality of times and wherein an evaluation is made based on the performance of the second task and wherein the evaluation is stored in a replay buffer. 7. The method of claim 1 wherein a critic network is used to train the action policy in response to a performance of the second task. 8. The method of claim 1 wherein a critic network is used to train the action policy by updating a parameter of a neural network in response to a temporal difference error. 9. An apparatus comprising: a sensor for detecting an input; a processor for determining a final task, the processor being further operative for determining a first task and a second task in response to the final task and the input wherein the second task has a higher difficulty that the first task wherein the first task and the second task are determined in response to a curriculum learning system and wherein the determination of the first task and the second task are made according to an action value based incremental method for an automatically generated curriculum sequence, training an agent to perform the first task in order to generate an action policy to maximize a first reward, and to perform the second task in response to the action policy to maximize a second reward; and controlling a vehicle in performance of the second task in response to the action policy. 10. The apparatus of claim 9 wherein the action policy is generated in accordance with a reinforced learning system. 11. The apparatus of claim 9 wherein the curriculum learning system utilizes an armed bandit problem methodology. 12. The apparatus of claim 9 wherein a curriculum sequence is determined in response to a first difficulty of the first task and a second difficulty of the second task and wherein the curriculum sequence is used by the agent to generate the action policy. 13. The apparatus of claim 9 wherein a grade of the performance of controlling the vehicle according to the final task is stored to a replay buffer. 14. The apparatus of claim 9 wherein the second task is performed a plurality of times and wherein an evaluation is made based on the performance of the final task and wherein the evaluation is stored in a replay buffer. 15. The apparatus of claim 9 wherein a critic network is used to train the action policy in response to a performance of the final task. 16. The apparatus of claim 9 wherein a critic network is used to train the action policy by updating a parameter of a neural network in response to a temporal difference error.

Assignees

Inventors

Classifications

B60W60/00274
considering possible movement changes · CPC title
B60W60/001Primary
Planning or execution of driving tasks · CPC title
G06N5/01
Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title
G06N3/092
Reinforcement learning · CPC title
B60W2050/0088
Adaptive recalibration · CPC title

Patent family

Related publications grouped by family.

View patent family 67842618

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10732639B2 cover?: The present application generally relates to a method and apparatus for generating an action policy for controlling an autonomous vehicle. In particular, the system performs a deep learning algorithm in order to determine the action policy and an automatically generated curriculum system to determine a number of increasingly difficult tasks in order to refine the action policy.
Who is the assignee on this patent?: Gm Global Tech Operations Llc, Univ Carnegie Mellon
What technology area does this patent fall under?: Primary CPC classification B60W60/001. Mapped technology areas include Operations & Transport.
When was this patent published?: Publication date Tue Aug 04 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Trajectory selection for an autonomous vehicle

Training neural networks on partitioned training data

Autonomous vehicle policy generation

Trajectory generation using temporal logic and tree search

Frequently asked questions