Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06N3/092. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 09 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Multi-model controller

US11170293B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11170293-B2
Application number	US-201514985017-A
Country	US
Kind code	B2
Filing date	Dec 30, 2015
Priority date	Dec 30, 2015
Publication date	Nov 9, 2021
Grant date	Nov 9, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processing unit can operate a first recurrent computational model (RCM) to provide first state information and a predicted result value. The processing unit can operating a first network computational model (NCM) to provide respective expectation values of a plurality of actions based at least in part on the first state information. The processing unit can provide an indication of at least one of the plurality of actions, and receive a reference result value, e.g., via a communications interface. The processing unit can train the first RCM based at least in part on the predicted result value and the reference result value to provide a second RCM, and can train the first NCM based at least in part on the first state information and the at least one of the plurality of actions to provide a second NCM.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a communications interface; one or more processing unit(s); and one or more computer-readable media having thereon computer-executable instructions, the computer-executable instructions, upon execution, causing the one or more processing unit(s) to perform operations for coordinated training and operation of computational models, the operations comprising: operating a first recurrent neural network (RNN) computational model on a first observation value to provide first state information and a first predicted result value; operating a first Q network (QN) computational model on the first state information to provide respective expectation values of a plurality of actions; selecting an action among the plurality of actions based on the expectation values and causing the communications interface to provide an indication of the selected action; using a supervised-learning update rule to train the first RNN computational model based at least in part on the predicted result value and a corresponding reference result value received via the communications interface in response to the indication to provide a second RNN computational model; operating the second RNN computational model on a second observation value received via the communications interface in response to the indication to provide second state information and a second predicted result value; and using a reinforcement-learning update rule to train the first QN computational model based at least in part on the first state information, the second state information, the reference result value, and the selected action to provide a second QN computational model. 2. A system as recited in claim 1 , wherein the selected action corresponds to a highest expectation value of the expectation values. 3. A system as recited in claim 1 , wherein: the first RNN computational model is further operated to provide a predicted observation value and the first RNN computational model is trained further based on the predicted observation value and the second observation value. 4. A system as recited in claim 3 , further comprising a sensor coupled to the communications interface and configured to provide the second observation value. 5. A system as recited in claim 1 , further comprising an actuator coupled to the communications interface and responsive to the indication of the selected action to perform the selected action. 6. A system as recited in claim 1 , further comprising a result subsystem coupled to the communications interface and configured to provide the reference result value. 7. A method for coordinated training and operation of computational models, the method comprising: operating a first recurrent neural network (RNN) computational model on a first observation value to provide first state information and a first predicted result value; operating a first Q network (QN) computational model on the first state information to provide respective expectation values of a plurality of actions; selecting an action among the plurality of actions based on the expectation values and providing an indication of the selected action via a communications interface; receiving a first reference result value and a second observation value via the communications interface; training the first RNN computational model, using a supervised-learning update rule, based at least in part on the first predicted result value and the first reference result value to provide a second RNN computational model; operating the second RNN computational model on the second observation value to provide second state information and a second predicted result value; and training the first QN computational model, using a reinforcement-learning update rule, based at least in part on the first state information, the second state information, the first reference result value, and the selected action to provide a second QN computational model. 8. A method as recited in claim 7 , wherein the first observation value comprises a sensor reading. 9. A method as recited in claim 7 , further comprising: operating the first RNN computational model to further provide a predicted observation value; and training the first RNN computational model further based on the predicted observation value and the second observation value to provide the second RNN computational model. 10. A method as recited in claim 7 , wherein the receiving the second observation value is performed after the providing the indication. 11. A non-transitory computer-readable medium having thereon computer-executable instructions, the computer-executable instructions upon execution configuring a computer to perform operations for coordinated training and operation of computational models, the operations comprising: operating a first recurrent neural network (RNN) computational model on a first observation value to provide first state information and a first predicted result value; operating a first Q network (QN) computational model on the first state information to provide respective expectation values of a plurality of actions; selecting an action among the plurality of actions based on the expectation values and providing an indication of the selected action via a communications interface; receiving a first reference result value and a second observation value via the communications interface; training the first RNN computational model, using a supervised-learning update rule, based at least in part on the first predicted result value and the first reference result value to provide a second RNN computational model; operating the second RNN computational model on the second observation value to provide second state information and a second predicted result value; and training the first QN computational model, using a reinforcement-learning update rule, based at least in part on the first state information, the second state information, the first reference result value, and the selected action to provide a second QN computational model. 12. A non-transitory computer-readable medium as recited in claim 11 , wherein one or more of the plurality of values of the training data comprise respective sensor readings.

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06N3/045
Combinations of networks · CPC title
G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/092Primary
Reinforcement learning · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N3/09
Supervised learning · CPC title

Patent family

Related publications grouped by family.

View patent family 59226386

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11170293B2 cover?: A processing unit can operate a first recurrent computational model (RCM) to provide first state information and a predicted result value. The processing unit can operating a first network computational model (NCM) to provide respective expectation values of a plurality of actions based at least in part on the first state information. The processing unit can provide an indication of at least on…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06N3/092. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 09 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Methods and apparatus for reinforcement learning

Dueling deep neural networks

Method, system and artificial neural network

Stochastic apparatus and methods for implementing generalized learning rules

Generating representations of acoustic sequences

Apparatus and methods for reinforcement-guided supervised learning

Frequently asked questions