Multi-model controller

US11170293B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11170293-B2
Application numberUS-201514985017-A
CountryUS
Kind codeB2
Filing dateDec 30, 2015
Priority dateDec 30, 2015
Publication dateNov 9, 2021
Grant dateNov 9, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processing unit can operate a first recurrent computational model (RCM) to provide first state information and a predicted result value. The processing unit can operating a first network computational model (NCM) to provide respective expectation values of a plurality of actions based at least in part on the first state information. The processing unit can provide an indication of at least one of the plurality of actions, and receive a reference result value, e.g., via a communications interface. The processing unit can train the first RCM based at least in part on the predicted result value and the reference result value to provide a second RCM, and can train the first NCM based at least in part on the first state information and the at least one of the plurality of actions to provide a second NCM.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a communications interface; one or more processing unit(s); and one or more computer-readable media having thereon computer-executable instructions, the computer-executable instructions, upon execution, causing the one or more processing unit(s) to perform operations for coordinated training and operation of computational models, the operations comprising: operating a first recurrent neural network (RNN) computational model on a first observation value to provide first state information and a first predicted result value; operating a first Q network (QN) computational model on the first state information to provide respective expectation values of a plurality of actions; selecting an action among the plurality of actions based on the expectation values and causing the communications interface to provide an indication of the selected action; using a supervised-learning update rule to train the first RNN computational model based at least in part on the predicted result value and a corresponding reference result value received via the communications interface in response to the indication to provide a second RNN computational model; operating the second RNN computational model on a second observation value received via the communications interface in response to the indication to provide second state information and a second predicted result value; and using a reinforcement-learning update rule to train the first QN computational model based at least in part on the first state information, the second state information, the reference result value, and the selected action to provide a second QN computational model. 2. A system as recited in claim 1 , wherein the selected action corresponds to a highest expectation value of the expectation values. 3. A system as recited in claim 1 , wherein: the first RNN computational model is further operated to provide a predicted observation value and the first RNN computational model is trained further based on the predicted observation value and the second observation value. 4. A system as recited in claim 3 , further comprising a sensor coupled to the communications interface and configured to provide the second observation value. 5. A system as recited in claim 1 , further comprising an actuator coupled to the communications interface and responsive to the indication of the selected action to perform the selected action. 6. A system as recited in claim 1 , further comprising a result subsystem coupled to the communications interface and configured to provide the reference result value. 7. A method for coordinated training and operation of computational models, the method comprising: operating a first recurrent neural network (RNN) computational model on a first observation value to provide first state information and a first predicted result value; operating a first Q network (QN) computational model on the first state information to provide respective expectation values of a plurality of actions; selecting an action among the plurality of actions based on the expectation values and providing an indication of the selected action via a communications interface; receiving a first reference result value and a second observation value via the communications interface; training the first RNN computational model, using a supervised-learning update rule, based at least in part on the first predicted result value and the first reference result value to provide a second RNN computational model; operating the second RNN computational model on the second observation value to provide second state information and a second predicted result value; and training the first QN computational model, using a reinforcement-learning update rule, based at least in part on the first state information, the second state information, the first reference result value, and the selected action to provide a second QN computational model. 8. A method as recited in claim 7 , wherein the first observation value comprises a sensor reading. 9. A method as recited in claim 7 , further comprising: operating the first RNN computational model to further provide a predicted observation value; and training the first RNN computational model further based on the predicted observation value and the second observation value to provide the second RNN computational model. 10. A method as recited in claim 7 , wherein the receiving the second observation value is performed after the providing the indication. 11. A non-transitory computer-readable medium having thereon computer-executable instructions, the computer-executable instructions upon execution configuring a computer to perform operations for coordinated training and operation of computational models, the operations comprising: operating a first recurrent neural network (RNN) computational model on a first observation value to provide first state information and a first predicted result value; operating a first Q network (QN) computational model on the first state information to provide respective expectation values of a plurality of actions; selecting an action among the plurality of actions based on the expectation values and providing an indication of the selected action via a communications interface; receiving a first reference result value and a second observation value via the communications interface; training the first RNN computational model, using a supervised-learning update rule, based at least in part on the first predicted result value and the first reference result value to provide a second RNN computational model; operating the second RNN computational model on the second observation value to provide second state information and a second predicted result value; and training the first QN computational model, using a reinforcement-learning update rule, based at least in part on the first state information, the second state information, the first reference result value, and the selected action to provide a second QN computational model. 12. A non-transitory computer-readable medium as recited in claim 11 , wherein one or more of the plurality of values of the training data comprise respective sensor readings.

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • G06N3/092Primary

    Reinforcement learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11170293B2 cover?
A processing unit can operate a first recurrent computational model (RCM) to provide first state information and a predicted result value. The processing unit can operating a first network computational model (NCM) to provide respective expectation values of a plurality of actions based at least in part on the first state information. The processing unit can provide an indication of at least on…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/092. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 09 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).