What technology area does this patent fall under?

Primary CPC classification G06Q40/04. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Mar 11 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

System and method for uncertainty-based advice for deep reinforcement learning agents

US2021073912A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2021073912-A1
Application number	US-202017011310-A
Country	US
Kind code	A1
Filing date	Sep 3, 2020
Priority date	Sep 5, 2019
Publication date	Mar 11, 2021
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed are systems, methods, and devices for training a learning agent. A learning agent that maintains a reinforcement learning neural network is instantiated. State data reflective of a state of an environment explored by the learning agent is received. An uncertainty metric calculated upon processing the state data, the uncertainty metric measuring epistemic uncertainty of the learning agent. Upon determining that the uncertainty metric exceeds a pre-defined threshold: a request signal requesting an action suggestion from a demonstrator is sent; a suggestion signal reflective of the action suggestion is received; and an action signal to implement the action suggestion is sent.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented system for training a learning agent, the system comprising: at least one processor; memory in communication with the at least one processor, and software code stored in the memory, which when executed by the at least one processor causes the system to: instantiate a learning agent that maintains a reinforcement learning neural network; receive state data reflective of a state of an environment explored by the learning agent; calculate an uncertainty metric upon processing the state data, the uncertainty metric measuring epistemic uncertainty of the learning agent; upon determining that the uncertainty metric exceeds a pre-defined threshold: send a request signal requesting an action suggestion from a demonstrator; receive a suggestion signal reflective of the action suggestion; and send an action signal to implement the action suggestion. 2 . The computer-implemented system of claim 1 , wherein the demonstrator comprises an automated agent. 3 . The computer-implemented system of claim 2 , wherein the automated agent has a policy that differs from a policy of the learning agent. 4 . The computer-implemented system of claim 1 , wherein the demonstrator comprises a human. 5 . The computer-implemented system of claim 1 , wherein the reinforcement learning neural network comprises a plurality of hidden layers including a layer having a plurality of heads, each of the heads for generating predictions of action values for actions that can taken by the learning agent. 6 . The computer-implemented system of claim 1 , wherein the environment is an electronic trading platform. 7 . The computer-implemented system of claim 1 , further comprising a network communication interface for transmitting signals through a network, and the request signal is sent by way of the network communication interface. 8 . The computer-implemented system of claim 7 , wherein the action signal is sent by way of the network communication interface. 9 . A computer-implemented method for training a learning agent, the method comprising: instantiating a learning agent that maintains a reinforcement learning neural network; receiving state data reflective of a state of an environment explored by the learning agent; calculating an uncertainty metric upon processing the state data, the uncertainty metric measuring epistemic uncertainty of the learning agent; upon determining that the uncertainty metric exceeds a pre-defined threshold: sending a request signal requesting an action suggestion from a demonstrator; receiving a suggestion signal reflective of the action suggestion; and sending an action signal to implement the action suggestion. 10 . The computer-implemented method of claim 9 , wherein the reinforcement learning neural network comprises a plurality of hidden layers including a layer having a plurality of heads, each of the heads for generating predictions of action values for actions that can taken by the learning agent. 11 . The computer-implemented method of claim 10 , wherein the calculating the uncertainty metric comprises: receiving, from each of the plurality of heads, a predicted action value; and computing a variance of the predicted action values received from the plurality of heads. 12 . The computer-implemented method of claim 10 , wherein each of the plurality heads minimizes a loss function associated with that head. 13 . The computer-implemented method of claim 9 , further comprising determining whether the demonstrator is available. 14 . The computer-implemented method of claim 13 , further comprising maintaining an advice budget for the demonstrator and the determining comprises determining whether the advice budget is depleted. 15 . The computer-implemented method of claim 9 , further comprising selecting the demonstrator from among a plurality of demonstrators. 16 . The computer-implemented method of claim 9 , wherein the demonstrator comprises an automated agent. 17 . The computer-implemented method of claim 16 , wherein the automated agent has a policy that differs from a policy of the learning agent. 18 . The computer-implemented method of claim 9 , wherein the demonstrator comprises a human. 19 . The computer-implemented method of claim 9 , further comprising updating a policy of the learning agent based on the action suggestion. 20 . A computer-implemented method for determining epistemic uncertainty of a learning agent, the method comprising: maintaining a neural network comprising a plurality of hidden layers including a layer having a plurality of heads, each of the heads generating predictions of action values for actions that can taken by the learning agent; for a given state of an environment explored by the learning agent: receiving, from each of the plurality of heads, a predicted action value; and computing a variance of the predicted action values received from the plurality of heads.

Assignees

Royal Bank Of Canada

Inventors

Classifications

G06N7/01
Probabilistic graphical models, e.g. probabilistic networks · CPC title
G06N3/045
Combinations of networks · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N3/092
Reinforcement learning · CPC title

Patent family

Related publications grouped by family.

View patent family 74851338

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021073912A1 cover?: Disclosed are systems, methods, and devices for training a learning agent. A learning agent that maintains a reinforcement learning neural network is instantiated. State data reflective of a state of an environment explored by the learning agent is received. An uncertainty metric calculated upon processing the state data, the uncertainty metric measuring epistemic uncertainty of the learning ag…
Who is the assignee on this patent?: Royal Bank Of Canada
What technology area does this patent fall under?: Primary CPC classification G06Q40/04. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Mar 11 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Environment navigation using reinforcement learning

Systems and methods for providing customized financial advice

Optimizing data center controls using neural networks

Frequently asked questions