Methods and apparatus for reinforcement learning
US-9679258-B2 · Jun 13, 2017 · US
US11170293B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11170293-B2 |
| Application number | US-201514985017-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 30, 2015 |
| Priority date | Dec 30, 2015 |
| Publication date | Nov 9, 2021 |
| Grant date | Nov 9, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processing unit can operate a first recurrent computational model (RCM) to provide first state information and a predicted result value. The processing unit can operating a first network computational model (NCM) to provide respective expectation values of a plurality of actions based at least in part on the first state information. The processing unit can provide an indication of at least one of the plurality of actions, and receive a reference result value, e.g., via a communications interface. The processing unit can train the first RCM based at least in part on the predicted result value and the reference result value to provide a second RCM, and can train the first NCM based at least in part on the first state information and the at least one of the plurality of actions to provide a second NCM.
Opening claim text (preview).
What is claimed is: 1. A system, comprising: a communications interface; one or more processing unit(s); and one or more computer-readable media having thereon computer-executable instructions, the computer-executable instructions, upon execution, causing the one or more processing unit(s) to perform operations for coordinated training and operation of computational models, the operations comprising: operating a first recurrent neural network (RNN) computational model on a first observation value to provide first state information and a first predicted result value; operating a first Q network (QN) computational model on the first state information to provide respective expectation values of a plurality of actions; selecting an action among the plurality of actions based on the expectation values and causing the communications interface to provide an indication of the selected action; using a supervised-learning update rule to train the first RNN computational model based at least in part on the predicted result value and a corresponding reference result value received via the communications interface in response to the indication to provide a second RNN computational model; operating the second RNN computational model on a second observation value received via the communications interface in response to the indication to provide second state information and a second predicted result value; and using a reinforcement-learning update rule to train the first QN computational model based at least in part on the first state information, the second state information, the reference result value, and the selected action to provide a second QN computational model. 2. A system as recited in claim 1 , wherein the selected action corresponds to a highest expectation value of the expectation values. 3. A system as recited in claim 1 , wherein: the first RNN computational model is further operated to provide a predicted observation value and the first RNN computational model is trained further based on the predicted observation value and the second observation value. 4. A system as recited in claim 3 , further comprising a sensor coupled to the communications interface and configured to provide the second observation value. 5. A system as recited in claim 1 , further comprising an actuator coupled to the communications interface and responsive to the indication of the selected action to perform the selected action. 6. A system as recited in claim 1 , further comprising a result subsystem coupled to the communications interface and configured to provide the reference result value. 7. A method for coordinated training and operation of computational models, the method comprising: operating a first recurrent neural network (RNN) computational model on a first observation value to provide first state information and a first predicted result value; operating a first Q network (QN) computational model on the first state information to provide respective expectation values of a plurality of actions; selecting an action among the plurality of actions based on the expectation values and providing an indication of the selected action via a communications interface; receiving a first reference result value and a second observation value via the communications interface; training the first RNN computational model, using a supervised-learning update rule, based at least in part on the first predicted result value and the first reference result value to provide a second RNN computational model; operating the second RNN computational model on the second observation value to provide second state information and a second predicted result value; and training the first QN computational model, using a reinforcement-learning update rule, based at least in part on the first state information, the second state information, the first reference result value, and the selected action to provide a second QN computational model. 8. A method as recited in claim 7 , wherein the first observation value comprises a sensor reading. 9. A method as recited in claim 7 , further comprising: operating the first RNN computational model to further provide a predicted observation value; and training the first RNN computational model further based on the predicted observation value and the second observation value to provide the second RNN computational model. 10. A method as recited in claim 7 , wherein the receiving the second observation value is performed after the providing the indication. 11. A non-transitory computer-readable medium having thereon computer-executable instructions, the computer-executable instructions upon execution configuring a computer to perform operations for coordinated training and operation of computational models, the operations comprising: operating a first recurrent neural network (RNN) computational model on a first observation value to provide first state information and a first predicted result value; operating a first Q network (QN) computational model on the first state information to provide respective expectation values of a plurality of actions; selecting an action among the plurality of actions based on the expectation values and providing an indication of the selected action via a communications interface; receiving a first reference result value and a second observation value via the communications interface; training the first RNN computational model, using a supervised-learning update rule, based at least in part on the first predicted result value and the first reference result value to provide a second RNN computational model; operating the second RNN computational model on the second observation value to provide second state information and a second predicted result value; and training the first QN computational model, using a reinforcement-learning update rule, based at least in part on the first state information, the second state information, the first reference result value, and the selected action to provide a second QN computational model. 12. A non-transitory computer-readable medium as recited in claim 11 , wherein one or more of the plurality of values of the training data comprise respective sensor readings.
Combinations of networks · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Reinforcement learning · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Supervised learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.