System and method for uncertainty-based advice for deep reinforcement learning agents

US2021073912A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021073912-A1
Application numberUS-202017011310-A
CountryUS
Kind codeA1
Filing dateSep 3, 2020
Priority dateSep 5, 2019
Publication dateMar 11, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed are systems, methods, and devices for training a learning agent. A learning agent that maintains a reinforcement learning neural network is instantiated. State data reflective of a state of an environment explored by the learning agent is received. An uncertainty metric calculated upon processing the state data, the uncertainty metric measuring epistemic uncertainty of the learning agent. Upon determining that the uncertainty metric exceeds a pre-defined threshold: a request signal requesting an action suggestion from a demonstrator is sent; a suggestion signal reflective of the action suggestion is received; and an action signal to implement the action suggestion is sent.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented system for training a learning agent, the system comprising: at least one processor; memory in communication with the at least one processor, and software code stored in the memory, which when executed by the at least one processor causes the system to: instantiate a learning agent that maintains a reinforcement learning neural network; receive state data reflective of a state of an environment explored by the learning agent; calculate an uncertainty metric upon processing the state data, the uncertainty metric measuring epistemic uncertainty of the learning agent; upon determining that the uncertainty metric exceeds a pre-defined threshold: send a request signal requesting an action suggestion from a demonstrator; receive a suggestion signal reflective of the action suggestion; and send an action signal to implement the action suggestion. 2 . The computer-implemented system of claim 1 , wherein the demonstrator comprises an automated agent. 3 . The computer-implemented system of claim 2 , wherein the automated agent has a policy that differs from a policy of the learning agent. 4 . The computer-implemented system of claim 1 , wherein the demonstrator comprises a human. 5 . The computer-implemented system of claim 1 , wherein the reinforcement learning neural network comprises a plurality of hidden layers including a layer having a plurality of heads, each of the heads for generating predictions of action values for actions that can taken by the learning agent. 6 . The computer-implemented system of claim 1 , wherein the environment is an electronic trading platform. 7 . The computer-implemented system of claim 1 , further comprising a network communication interface for transmitting signals through a network, and the request signal is sent by way of the network communication interface. 8 . The computer-implemented system of claim 7 , wherein the action signal is sent by way of the network communication interface. 9 . A computer-implemented method for training a learning agent, the method comprising: instantiating a learning agent that maintains a reinforcement learning neural network; receiving state data reflective of a state of an environment explored by the learning agent; calculating an uncertainty metric upon processing the state data, the uncertainty metric measuring epistemic uncertainty of the learning agent; upon determining that the uncertainty metric exceeds a pre-defined threshold: sending a request signal requesting an action suggestion from a demonstrator; receiving a suggestion signal reflective of the action suggestion; and sending an action signal to implement the action suggestion. 10 . The computer-implemented method of claim 9 , wherein the reinforcement learning neural network comprises a plurality of hidden layers including a layer having a plurality of heads, each of the heads for generating predictions of action values for actions that can taken by the learning agent. 11 . The computer-implemented method of claim 10 , wherein the calculating the uncertainty metric comprises: receiving, from each of the plurality of heads, a predicted action value; and computing a variance of the predicted action values received from the plurality of heads. 12 . The computer-implemented method of claim 10 , wherein each of the plurality heads minimizes a loss function associated with that head. 13 . The computer-implemented method of claim 9 , further comprising determining whether the demonstrator is available. 14 . The computer-implemented method of claim 13 , further comprising maintaining an advice budget for the demonstrator and the determining comprises determining whether the advice budget is depleted. 15 . The computer-implemented method of claim 9 , further comprising selecting the demonstrator from among a plurality of demonstrators. 16 . The computer-implemented method of claim 9 , wherein the demonstrator comprises an automated agent. 17 . The computer-implemented method of claim 16 , wherein the automated agent has a policy that differs from a policy of the learning agent. 18 . The computer-implemented method of claim 9 , wherein the demonstrator comprises a human. 19 . The computer-implemented method of claim 9 , further comprising updating a policy of the learning agent based on the action suggestion. 20 . A computer-implemented method for determining epistemic uncertainty of a learning agent, the method comprising: maintaining a neural network comprising a plurality of hidden layers including a layer having a plurality of heads, each of the heads generating predictions of action values for actions that can taken by the learning agent; for a given state of an environment explored by the learning agent: receiving, from each of the plurality of heads, a predicted action value; and computing a variance of the predicted action values received from the plurality of heads.

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Combinations of networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Reinforcement learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021073912A1 cover?
Disclosed are systems, methods, and devices for training a learning agent. A learning agent that maintains a reinforcement learning neural network is instantiated. State data reflective of a state of an environment explored by the learning agent is received. An uncertainty metric calculated upon processing the state data, the uncertainty metric measuring epistemic uncertainty of the learning ag…
Who is the assignee on this patent?
Royal Bank Of Canada
What technology area does this patent fall under?
Primary CPC classification G06Q40/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Mar 11 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).