Multi-model controller

US2017193360A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2017193360-A1
Application numberUS-201514985017-A
CountryUS
Kind codeA1
Filing dateDec 30, 2015
Priority dateDec 30, 2015
Publication dateJul 6, 2017
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processing unit can operate a first recurrent computational model (RCM) to provide first state information and a predicted result value. The processing unit can operating a first network computational model (NCM) to provide respective expectation values of a plurality of actions based at least in part on the first state information. The processing unit can provide an indication of at least one of the plurality of actions, and receive a reference result value, e.g., via a communications interface. The processing unit can train the first RCM based at least in part on the predicted result value and the reference result value to provide a second RCM, and can train the first NCM based at least in part on the first state information and the at least one of the plurality of actions to provide a second NCM.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system, comprising: a communications interface; one or more processing unit(s) adapted to execute modules; and one or more computer-readable media having thereon a plurality of modules, the plurality of modules comprising: a module of a representation engine that is configured to: operate a first recurrent computational model (RCM) to provide first state information and a predicted result value; and train the first RCM based at least in part on the predicted result value and a corresponding reference result value to provide a second RCM; a module of an action engine that is configured to: operate a first network computational model (NCM) to provide respective expectation values of a plurality of actions based at least in part on the first state information; select an action of the plurality of actions based at least in part on one or more of the expectation values; and train the first NCM based at least in part on the first state information and the selected action to provide a second NCM; and a module of a communications engine that is configured to: provide an indication of the selected action via the communications interface; and receive the reference result value via the communications interface. 2 . A system as recited in claim 1 , wherein: the representation engine is further configured to operate the second RCM to provide second state information; and the action engine is configured to train the first NCM further based on the second state information. 3 . A system as recited in claim 1 , wherein at least the first RCM or the second RCM comprises a recurrent neural network and the representation engine is configured to train the first RCM using a supervised-learning update rule. 4 . A system as recited in claim 1 , wherein at least the first NCM or the second NCM comprises a neural network and the action engine is configured to train the first NCM using a reinforcement-learning update rule. 5 . A system as recited in claim 1 , wherein the action engine is configured to select the action of the plurality of actions corresponding to a highest expectation value of the one or more of the expectation values. 6 . A system as recited in claim 1 , wherein: the representation engine is further configured to operate the first RCM to provide a predicted observation value and to train the first RCM further based on the predicted observation value and a reference observation value; and the communications engine is further configured to receive the reference observation value. 7 . A system as recited in claim 6 , further comprising a sensor coupled to the communications interface and configured to provide the reference observation value. 8 . A system as recited in claim 1 , further comprising an actuator coupled to the communications interface and responsive to the indication of the selected action to perform the selected action. 9 . A system as recited in claim 1 , further comprising a result subsystem coupled to the communications interface and configured to provide the reference result value. 10 . A method, comprising: operating a first recurrent computational model (RCM) to provide first state information and a predicted result value; operating a first network computational model (NCM) to provide respective expectation values of a plurality of actions based at least in part on the first state information; providing an indication of at least one of the plurality of actions via a communications interface; receiving a reference result value via the communications interface; training the first RCM based at least in part on the predicted result value and the reference result value to provide a second RCM; and training the first NCM based at least in part on the first state information and the at least one of the plurality of actions to provide a second NCM. 11 . A method as recited in claim 10 , further comprising: receiving a first observation value via the communications interface; and operating the first RCM further based on the first observation value to provide the first state information and the predicted result value. 12 . A method as recited in claim 11 , wherein the first observation value comprises a sensor reading. 13 . A method as recited in claim 11 , further comprising: operating the first RCM to further provide a predicted observation value; receiving a second observation value via the communications interface; and training the first RCM further based on the predicted observation value and the second observation value to provide the second RCM. 14 . A method as recited in claim 13 , wherein the receiving the second observation value is performed after the providing the indication. 15 . A method as recited in claim 10 , further comprising operating the second RCM to provide second state information, wherein the training the first NCM is further based on the second state information. 16 . A computer-readable medium having thereon computer-executable instructions, the computer-executable instructions upon execution configuring a computer to perform operations comprising: operating a first recurrent computational model (RCM) based at least in part on one or more values of training data to provide one or more state values and respective predicted result values; operating a first network computational model (NCM) based at least in part on the one or more state values to provide respective expectation vectors, at least one of the expectation vector including one or more expectation values corresponding to respective actions; training the first RCM based at least in part on the predicted result values and respective reference result values to provide a second RCM; and training the first NCM based at least in part on the state values, respective subsequent ones of the values of training data, and the respective reference result values to provide a second NCM. 17 . A computer-readable medium as recited in claim 16 , the operations further comprising: selecting an action for at least one of the values of the training data based at least in part on the respective expectation vector; and training the first NCM further based on the selected respective action to provide the second RCM. 18 . A computer-readable medium as recited in claim 16 , the operations further comprising: operating the second RCM based at least in part on at least one of the subsequent ones of the values of the training data to provide a second state value; and training the first NCM further based on the second state value to provide the second RCM. 19 . A computer-readable medium as recited in claim 16 , the operations further comprising: operating the first RCM to further provide a predicted training-data value; and training the first RCM further based on the predicted training-data value and the respective subsequent one of the values of the training data to provide the second RCM. 20 . A computer-readable medium as recited in claim 16 , wherein one or more of the plurality of values of the training data comprise respective sensor readings.

Assignees

Inventors

Classifications

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Combinations of networks · CPC title

  • G06N3/092Primary

    Reinforcement learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017193360A1 cover?
A processing unit can operate a first recurrent computational model (RCM) to provide first state information and a predicted result value. The processing unit can operating a first network computational model (NCM) to provide respective expectation values of a plurality of actions based at least in part on the first state information. The processing unit can provide an indication of at least on…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/092. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jul 06 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).