Online learning and vehicle control method based on reinforcement learning without active exploration

US2018009445A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2018009445-A1
Application numberUS-201615205558-A
CountryUS
Kind codeA1
Filing dateJul 8, 2016
Priority dateJul 8, 2016
Publication dateJan 11, 2018
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method of adaptively controlling an autonomous operation of a vehicle is provided. The method includes steps of (a) in a critic network in a computing system configured to autonomously control the vehicle, determining, using samples of passively collected data and a state cost, an estimated average cost, and an approximated cost-to-go function that produces a minimum value for a cost-to-go of the vehicle when applied by an actor network; and (b) in an actor network in the computing system and operatively coupled to the critic network, determining a control input to apply to the vehicle that produces the minimum value for the cost-to-go, wherein the actor network is configured to determine the control input by estimating a noise level using the average cost, a cost-to-go determined from the approximated cost-to-go function, a control dynamics for a current state of the vehicle, and the passively collected data.

First claim

Opening claim text (preview).

1 . A computer-implemented method of adaptively controlling an autonomous operation of a vehicle, the method comprising: a) in a critic network in a computing system configured to autonomously control the vehicle, determining, using samples of passively collected data and a state cost, an estimated average cost, and an approximated cost-to-go function that produces a minimum value for a cost-to-go of the vehicle when applied by an actor network; and b) in an actor network in the computing system and operatively coupled to the critic network, determining a control input to apply to the vehicle which produces the minimum value for the cost-to-go, wherein the actor network is configured to determine the control input by estimating a noise level using the estimated average cost, an estimated cost-to-go determined from the approximated cost-to-go function, a control dynamics for a current state of the vehicle, and the samples of passively collected data. 2 . A computer-implemented method of adaptively controlling an autonomous operation of a vehicle, the method comprising: a) in a critic network in a computing system configured to autonomously control the vehicle, determining, using samples of passively collected data and a state cost, an estimated average cost, and an approximated cost-to-go function that produces a minimum value for a cost-to-go of the vehicle when applied by an actor network; and b) in an actor network in the computing system and operatively coupled to the critic network, determining a control input to apply to the vehicle which produces the minimum value for the cost-to-go, wherein the actor network is configured to determine the control input by estimating a noise level using the estimated average cost, an estimated cost-to-go determined from the approximated cost-to-go function, a control dynamics for a current state of the vehicle, and the samples of passively collected data, and wherein the approximated cost-to-go function is determined using a linear combination of weighted radial basis functions in accordance with the following relationship: Z ^  ( x ) := ∑ j = 0 N  ω j  f j  ( x ) where ω are weights f j are j-th radial basis functions, N is a number of radial basis functions used for determining the approximated the cost-to-go function, and {circumflex over (Z)}(x) is the-approximated cost-to-go function. 3 . The method of claim 2 wherein weights co used in the approximated cost-to-go function are updated in accordance with the following relationship: ω i + 1 = ω ~ i + λ 1 + ∑ i = 0 N  λ 2  l  δ jl + λ 3  Z ^ avg  f k where δ ij denotes a Dirac delta function, superscript denotes a number of iterations, λ 1 , λ 2 , λ 3 are Lagrangian multipliers, and {circumflex over (Z)} avg is an estimated average cost. 4 . The method of claim 1 further comprising the step of updating parameters of the critic network using an approximated temporal difference error determined using a linearized version of a Bellman equation. 5 . The method of claim 4 wherein updating of the critic network parameters is performed when the vehicle is in motion. 6 . A computer-implemented method of adaptively controlling an autonomous operation of a vehicle, the method comprising: a) in a critic network in a computing system configured to autonomously control the vehicle, determining, using samples of passively collected data and a state cost, an estimated average cost, and an approximated cost-to-go function that produces a minimum value for a cost-to-go of the vehicle when applied by an actor network; and b) in an actor network in the computing system and operatively coupled to the critic network, determining a control input to apply to the vehicle which produces the minimum value for the cost-to-go, Wherein the actor network is configured to determine the control input by estimating a noise level using the estimated average cost, an estimated cost-to-go determined from the approximated cost-to-go function, a control dynamics for a current state of the vehicle, and the samples of passively collected data, the method further comprising the step of updating parameters of the critic network using an approximated temporal difference error determined using a linearized version of a Bellman equation, and wherein the estimated average cost determined by the critic network is updated in accordance with the following relationship: {circumflex over (Z)} avg i+1 ={circumflex over (Z)} avg i −α Z i e k {circumflex over (Z)} k where β is a learning rate, e k is the approximated temporal difference error, {circumflex over (Z)} k is an estimated cost determined from the approximated cost-to-go function, {circumflex over (Z)} avg i is an estimated average cost in state i, and {circumflex over (Z)} avg i+1 is an estimated average cost in state i+1. 7 . The method of claim 4 , wherein passively-collected data is the only data used during updating of the critic network parameters. 8 . A computer-implemented method of adaptively controlling an autonomous operation of a vehicle, the method comprising: a) in a critic network in a computing system configured to autonomously control the vehicle, determining, using samples of passively collected d

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Combinations of networks · CPC title

  • Data processing systems or methods, management, administration · CPC title

  • the criterion being a learning criterion · CPC title

  • in which a variable is automatically adjusted to optimise the performance · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018009445A1 cover?
A computer-implemented method of adaptively controlling an autonomous operation of a vehicle is provided. The method includes steps of (a) in a critic network in a computing system configured to autonomously control the vehicle, determining, using samples of passively collected data and a state cost, an estimated average cost, and an approximated cost-to-go function that produces a minimum valu…
Who is the assignee on this patent?
Toyota Eng & Mfg North America
What technology area does this patent fall under?
Primary CPC classification B60W50/0098. Mapped technology areas include Operations & Transport.
When was this patent published?
Publication date Thu Jan 11 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).