What technology area does this patent fall under?

Primary CPC classification B60W50/0098. Mapped technology areas include Operations & Transport.

When was this patent published?

Publication date Thu Jan 11 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Online learning and vehicle control method based on reinforcement learning without active exploration

Patent metadata
Field	Value
Publication number	US-2018009445-A1
Application number	US-201615205558-A
Country	US
Kind code	A1
Filing date	Jul 8, 2016
Priority date	Jul 8, 2016
Publication date	Jan 11, 2018
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method of adaptively controlling an autonomous operation of a vehicle is provided. The method includes steps of (a) in a critic network in a computing system configured to autonomously control the vehicle, determining, using samples of passively collected data and a state cost, an estimated average cost, and an approximated cost-to-go function that produces a minimum value for a cost-to-go of the vehicle when applied by an actor network; and (b) in an actor network in the computing system and operatively coupled to the critic network, determining a control input to apply to the vehicle that produces the minimum value for the cost-to-go, wherein the actor network is configured to determine the control input by estimating a noise level using the average cost, a cost-to-go determined from the approximated cost-to-go function, a control dynamics for a current state of the vehicle, and the passively collected data.

First claim

Opening claim text (preview).

1 . A computer-implemented method of adaptively controlling an autonomous operation of a vehicle, the method comprising: a) in a critic network in a computing system configured to autonomously control the vehicle, determining, using samples of passively collected data and a state cost, an estimated average cost, and an approximated cost-to-go function that produces a minimum value for a cost-to-go of the vehicle when applied by an actor network; and b) in an actor network in the computing system and operatively coupled to the critic network, determining a control input to apply to the vehicle which produces the minimum value for the cost-to-go, wherein the actor network is configured to determine the control input by estimating a noise level using the estimated average cost, an estimated cost-to-go determined from the approximated cost-to-go function, a control dynamics for a current state of the vehicle, and the samples of passively collected data. 2 . A computer-implemented method of adaptively controlling an autonomous operation of a vehicle, the method comprising: a) in a critic network in a computing system configured to autonomously control the vehicle, determining, using samples of passively collected data and a state cost, an estimated average cost, and an approximated cost-to-go function that produces a minimum value for a cost-to-go of the vehicle when applied by an actor network; and b) in an actor network in the computing system and operatively coupled to the critic network, determining a control input to apply to the vehicle which produces the minimum value for the cost-to-go, wherein the actor network is configured to determine the control input by estimating a noise level using the estimated average cost, an estimated cost-to-go determined from the approximated cost-to-go function, a control dynamics for a current state of the vehicle, and the samples of passively collected data, and wherein the approximated cost-to-go function is determined using a linear combination of weighted radial basis functions in accordance with the following relationship: Z ^  ( x ) := ∑ j = 0 N  ω j  f j  ( x ) where ω are weights f j are j-th radial basis functions, N is a number of radial basis functions used for determining the approximated the cost-to-go function, and {circumflex over (Z)}(x) is the-approximated cost-to-go function. 3 . The method of claim 2 wherein weights co used in the approximated cost-to-go function are updated in accordance with the following relationship: ω i + 1 = ω ~ i + λ 1 + ∑ i = 0 N  λ 2  l  δ jl + λ 3  Z ^ avg  f k where δ ij denotes a Dirac delta function, superscript denotes a number of iterations, λ 1 , λ 2 , λ 3 are Lagrangian multipliers, and {circumflex over (Z)} avg is an estimated average cost. 4 . The method of claim 1 further comprising the step of updating parameters of the critic network using an approximated temporal difference error determined using a linearized version of a Bellman equation. 5 . The method of claim 4 wherein updating of the critic network parameters is performed when the vehicle is in motion. 6 . A computer-implemented method of adaptively controlling an autonomous operation of a vehicle, the method comprising: a) in a critic network in a computing system configured to autonomously control the vehicle, determining, using samples of passively collected data and a state cost, an estimated average cost, and an approximated cost-to-go function that produces a minimum value for a cost-to-go of the vehicle when applied by an actor network; and b) in an actor network in the computing system and operatively coupled to the critic network, determining a control input to apply to the vehicle which produces the minimum value for the cost-to-go, Wherein the actor network is configured to determine the control input by estimating a noise level using the estimated average cost, an estimated cost-to-go determined from the approximated cost-to-go function, a control dynamics for a current state of the vehicle, and the samples of passively collected data, the method further comprising the step of updating parameters of the critic network using an approximated temporal difference error determined using a linearized version of a Bellman equation, and wherein the estimated average cost determined by the critic network is updated in accordance with the following relationship: {circumflex over (Z)} avg i+1 ={circumflex over (Z)} avg i −α Z i e k {circumflex over (Z)} k where β is a learning rate, e k is the approximated temporal difference error, {circumflex over (Z)} k is an estimated cost determined from the approximated cost-to-go function, {circumflex over (Z)} avg i is an estimated average cost in state i, and {circumflex over (Z)} avg i+1 is an estimated average cost in state i+1. 7 . The method of claim 4 , wherein passively-collected data is the only data used during updating of the critic network parameters. 8 . A computer-implemented method of adaptively controlling an autonomous operation of a vehicle, the method comprising: a) in a critic network in a computing system configured to autonomously control the vehicle, determining, using samples of passively collected d

Assignees

Toyota Eng & Mfg North America

Inventors

Nishi Tomoki

Classifications

G06N7/01
Probabilistic graphical models, e.g. probabilistic networks · CPC title
G06N3/045
Combinations of networks · CPC title
Y02T10/84
Data processing systems or methods, management, administration · CPC title
G05B13/0265
the criterion being a learning criterion · CPC title
G05B13/041
in which a variable is automatically adjusted to optimise the performance · CPC title

Patent family

Related publications grouped by family.

View patent family 60892997

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018009445A1 cover?: A computer-implemented method of adaptively controlling an autonomous operation of a vehicle is provided. The method includes steps of (a) in a critic network in a computing system configured to autonomously control the vehicle, determining, using samples of passively collected data and a state cost, an estimated average cost, and an approximated cost-to-go function that produces a minimum valu…
Who is the assignee on this patent?: Toyota Eng & Mfg North America
What technology area does this patent fall under?: Primary CPC classification B60W50/0098. Mapped technology areas include Operations & Transport.
When was this patent published?: Publication date Thu Jan 11 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).