Environment navigation using reinforcement learning
US-2019266449-A1 · Aug 29, 2019 · US
US2021073912A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2021073912-A1 |
| Application number | US-202017011310-A |
| Country | US |
| Kind code | A1 |
| Filing date | Sep 3, 2020 |
| Priority date | Sep 5, 2019 |
| Publication date | Mar 11, 2021 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed are systems, methods, and devices for training a learning agent. A learning agent that maintains a reinforcement learning neural network is instantiated. State data reflective of a state of an environment explored by the learning agent is received. An uncertainty metric calculated upon processing the state data, the uncertainty metric measuring epistemic uncertainty of the learning agent. Upon determining that the uncertainty metric exceeds a pre-defined threshold: a request signal requesting an action suggestion from a demonstrator is sent; a suggestion signal reflective of the action suggestion is received; and an action signal to implement the action suggestion is sent.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented system for training a learning agent, the system comprising: at least one processor; memory in communication with the at least one processor, and software code stored in the memory, which when executed by the at least one processor causes the system to: instantiate a learning agent that maintains a reinforcement learning neural network; receive state data reflective of a state of an environment explored by the learning agent; calculate an uncertainty metric upon processing the state data, the uncertainty metric measuring epistemic uncertainty of the learning agent; upon determining that the uncertainty metric exceeds a pre-defined threshold: send a request signal requesting an action suggestion from a demonstrator; receive a suggestion signal reflective of the action suggestion; and send an action signal to implement the action suggestion. 2 . The computer-implemented system of claim 1 , wherein the demonstrator comprises an automated agent. 3 . The computer-implemented system of claim 2 , wherein the automated agent has a policy that differs from a policy of the learning agent. 4 . The computer-implemented system of claim 1 , wherein the demonstrator comprises a human. 5 . The computer-implemented system of claim 1 , wherein the reinforcement learning neural network comprises a plurality of hidden layers including a layer having a plurality of heads, each of the heads for generating predictions of action values for actions that can taken by the learning agent. 6 . The computer-implemented system of claim 1 , wherein the environment is an electronic trading platform. 7 . The computer-implemented system of claim 1 , further comprising a network communication interface for transmitting signals through a network, and the request signal is sent by way of the network communication interface. 8 . The computer-implemented system of claim 7 , wherein the action signal is sent by way of the network communication interface. 9 . A computer-implemented method for training a learning agent, the method comprising: instantiating a learning agent that maintains a reinforcement learning neural network; receiving state data reflective of a state of an environment explored by the learning agent; calculating an uncertainty metric upon processing the state data, the uncertainty metric measuring epistemic uncertainty of the learning agent; upon determining that the uncertainty metric exceeds a pre-defined threshold: sending a request signal requesting an action suggestion from a demonstrator; receiving a suggestion signal reflective of the action suggestion; and sending an action signal to implement the action suggestion. 10 . The computer-implemented method of claim 9 , wherein the reinforcement learning neural network comprises a plurality of hidden layers including a layer having a plurality of heads, each of the heads for generating predictions of action values for actions that can taken by the learning agent. 11 . The computer-implemented method of claim 10 , wherein the calculating the uncertainty metric comprises: receiving, from each of the plurality of heads, a predicted action value; and computing a variance of the predicted action values received from the plurality of heads. 12 . The computer-implemented method of claim 10 , wherein each of the plurality heads minimizes a loss function associated with that head. 13 . The computer-implemented method of claim 9 , further comprising determining whether the demonstrator is available. 14 . The computer-implemented method of claim 13 , further comprising maintaining an advice budget for the demonstrator and the determining comprises determining whether the advice budget is depleted. 15 . The computer-implemented method of claim 9 , further comprising selecting the demonstrator from among a plurality of demonstrators. 16 . The computer-implemented method of claim 9 , wherein the demonstrator comprises an automated agent. 17 . The computer-implemented method of claim 16 , wherein the automated agent has a policy that differs from a policy of the learning agent. 18 . The computer-implemented method of claim 9 , wherein the demonstrator comprises a human. 19 . The computer-implemented method of claim 9 , further comprising updating a policy of the learning agent based on the action suggestion. 20 . A computer-implemented method for determining epistemic uncertainty of a learning agent, the method comprising: maintaining a neural network comprising a plurality of hidden layers including a layer having a plurality of heads, each of the heads generating predictions of action values for actions that can taken by the learning agent; for a given state of an environment explored by the learning agent: receiving, from each of the plurality of heads, a predicted action value; and computing a variance of the predicted action values received from the plurality of heads.
Probabilistic graphical models, e.g. probabilistic networks · CPC title
Combinations of networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Reinforcement learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.