Methods and apparatus for reinforcement learning
US-9679258-B2 · Jun 13, 2017 · US
US2017193360A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2017193360-A1 |
| Application number | US-201514985017-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 30, 2015 |
| Priority date | Dec 30, 2015 |
| Publication date | Jul 6, 2017 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processing unit can operate a first recurrent computational model (RCM) to provide first state information and a predicted result value. The processing unit can operating a first network computational model (NCM) to provide respective expectation values of a plurality of actions based at least in part on the first state information. The processing unit can provide an indication of at least one of the plurality of actions, and receive a reference result value, e.g., via a communications interface. The processing unit can train the first RCM based at least in part on the predicted result value and the reference result value to provide a second RCM, and can train the first NCM based at least in part on the first state information and the at least one of the plurality of actions to provide a second NCM.
Opening claim text (preview).
What is claimed is: 1 . A system, comprising: a communications interface; one or more processing unit(s) adapted to execute modules; and one or more computer-readable media having thereon a plurality of modules, the plurality of modules comprising: a module of a representation engine that is configured to: operate a first recurrent computational model (RCM) to provide first state information and a predicted result value; and train the first RCM based at least in part on the predicted result value and a corresponding reference result value to provide a second RCM; a module of an action engine that is configured to: operate a first network computational model (NCM) to provide respective expectation values of a plurality of actions based at least in part on the first state information; select an action of the plurality of actions based at least in part on one or more of the expectation values; and train the first NCM based at least in part on the first state information and the selected action to provide a second NCM; and a module of a communications engine that is configured to: provide an indication of the selected action via the communications interface; and receive the reference result value via the communications interface. 2 . A system as recited in claim 1 , wherein: the representation engine is further configured to operate the second RCM to provide second state information; and the action engine is configured to train the first NCM further based on the second state information. 3 . A system as recited in claim 1 , wherein at least the first RCM or the second RCM comprises a recurrent neural network and the representation engine is configured to train the first RCM using a supervised-learning update rule. 4 . A system as recited in claim 1 , wherein at least the first NCM or the second NCM comprises a neural network and the action engine is configured to train the first NCM using a reinforcement-learning update rule. 5 . A system as recited in claim 1 , wherein the action engine is configured to select the action of the plurality of actions corresponding to a highest expectation value of the one or more of the expectation values. 6 . A system as recited in claim 1 , wherein: the representation engine is further configured to operate the first RCM to provide a predicted observation value and to train the first RCM further based on the predicted observation value and a reference observation value; and the communications engine is further configured to receive the reference observation value. 7 . A system as recited in claim 6 , further comprising a sensor coupled to the communications interface and configured to provide the reference observation value. 8 . A system as recited in claim 1 , further comprising an actuator coupled to the communications interface and responsive to the indication of the selected action to perform the selected action. 9 . A system as recited in claim 1 , further comprising a result subsystem coupled to the communications interface and configured to provide the reference result value. 10 . A method, comprising: operating a first recurrent computational model (RCM) to provide first state information and a predicted result value; operating a first network computational model (NCM) to provide respective expectation values of a plurality of actions based at least in part on the first state information; providing an indication of at least one of the plurality of actions via a communications interface; receiving a reference result value via the communications interface; training the first RCM based at least in part on the predicted result value and the reference result value to provide a second RCM; and training the first NCM based at least in part on the first state information and the at least one of the plurality of actions to provide a second NCM. 11 . A method as recited in claim 10 , further comprising: receiving a first observation value via the communications interface; and operating the first RCM further based on the first observation value to provide the first state information and the predicted result value. 12 . A method as recited in claim 11 , wherein the first observation value comprises a sensor reading. 13 . A method as recited in claim 11 , further comprising: operating the first RCM to further provide a predicted observation value; receiving a second observation value via the communications interface; and training the first RCM further based on the predicted observation value and the second observation value to provide the second RCM. 14 . A method as recited in claim 13 , wherein the receiving the second observation value is performed after the providing the indication. 15 . A method as recited in claim 10 , further comprising operating the second RCM to provide second state information, wherein the training the first NCM is further based on the second state information. 16 . A computer-readable medium having thereon computer-executable instructions, the computer-executable instructions upon execution configuring a computer to perform operations comprising: operating a first recurrent computational model (RCM) based at least in part on one or more values of training data to provide one or more state values and respective predicted result values; operating a first network computational model (NCM) based at least in part on the one or more state values to provide respective expectation vectors, at least one of the expectation vector including one or more expectation values corresponding to respective actions; training the first RCM based at least in part on the predicted result values and respective reference result values to provide a second RCM; and training the first NCM based at least in part on the state values, respective subsequent ones of the values of training data, and the respective reference result values to provide a second NCM. 17 . A computer-readable medium as recited in claim 16 , the operations further comprising: selecting an action for at least one of the values of the training data based at least in part on the respective expectation vector; and training the first NCM further based on the selected respective action to provide the second RCM. 18 . A computer-readable medium as recited in claim 16 , the operations further comprising: operating the second RCM based at least in part on at least one of the subsequent ones of the values of the training data to provide a second state value; and training the first NCM further based on the second state value to provide the second RCM. 19 . A computer-readable medium as recited in claim 16 , the operations further comprising: operating the first RCM to further provide a predicted training-data value; and training the first RCM further based on the predicted training-data value and the respective subsequent one of the values of the training data to provide the second RCM. 20 . A computer-readable medium as recited in claim 16 , wherein one or more of the plurality of values of the training data comprise respective sensor readings.
Recurrent networks, e.g. Hopfield networks · CPC title
Combinations of networks · CPC title
Reinforcement learning · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Supervised learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.