What technology area does this patent fall under?

Primary CPC classification G06N3/006. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 25 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Action selection using interaction history graphs

US11636347B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11636347-B2
Application number	US-202016749252-A
Country	US
Kind code	B2
Filing date	Jan 22, 2020
Priority date	Jan 23, 2019
Publication date	Apr 25, 2023
Grant date	Apr 25, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent interacting with an environment. In one aspect, a method comprises: obtaining a graph of nodes and edges that represents an interaction history of the agent with the environment; generating an encoded representation of the graph representing the interaction history of the agent with the environment; processing an input based on the encoded representation of the graph using an action selection neural network, in accordance with current values of action selection neural network parameters, to generate an action selection output; and selecting an action from a plurality of possible actions to be performed by the agent using the action selection output generated by the action selection neural network.

First claim

Opening claim text (preview).

What is claimed is: 1. A method performed by one or more data processing apparatus for selecting actions to be performed by an agent interacting with an environment, the method comprising: obtaining data defining an interaction history graph of nodes and edges that represents an interaction history of the agent with the environment, wherein: each node in the interaction history graph represents a possible state of the environment, each edge in the interaction history graph connects a respective pair of nodes in the interaction history graph, and an edge in the interaction history graph connects a pair of nodes in the interaction history graph only if the state of the environment can transition from one of the nodes in the pair of nodes to the other node in the pair of nodes; processing data defining the interaction history graph representing the interaction history of the agent with the environment using a graph neural network to generate an encoded representation of the interaction history graph, comprising: obtaining a respective encoded representation of each node in the interaction history graph; iteratively updating the encoded representations of the nodes in the interaction history graph, over a plurality of update iterations, in accordance with values of a set of graph neural network parameters; and generating the encoded representation of the interaction history graph by combining the encoded representations of the nodes in the interaction history graph; processing an input based on the encoded representation of the interaction history graph using an action selection neural network, in accordance with current values of action selection neural network parameters, to generate an action selection output; and selecting an action from a plurality of possible actions to be performed by the agent using the action selection output generated by the action selection neural network. 2. The method of claim 1 , further comprising: identifying one or more new states of the environment, wherein: (i) the state of the environment transitions into the one or more new states as a result of the agent performing the selected action, and (ii) the state of the environment did not previously transition into any of the new states as a result of the agent performing previously selected actions during the interaction of the agent with the environment; determining a reward based on the new states of the environment; and adjusting the current values of the action selection neural network parameters based on the reward using a reinforcement learning technique. 3. The method of claim 2 , wherein determining the reward based on the new states of the environment comprises: determining the reward based on the number of new states of the environment. 4. The method of claim 1 , wherein each node in the graph that represents the interaction history of the agent with the environment corresponds to a state of the environment that the environment previously transitioned into as a result of the agent performing previously selected actions during the interaction of the agent with the environment. 5. The method of claim 4 , wherein each edge in the graph connects a pair of nodes in the graph only if the state of the environment previously transitioned from one of the nodes in the pair of nodes to the other node in the pair of nodes as a result of the agent performing previously selected actions during the interaction of the agent with the environment. 6. The method of claim 4 , wherein the environment is a software environment, each state of the software environment corresponds to a respective state of a user interface of the software environment, and the action selected to be performed by the agent defines a particular interaction with the user interface of the software environment. 7. The method of claim 4 , wherein the environment is a real-world environment, each state of the real-world environment corresponds to a respective spatial position in the real-world environment, the agent is a robotic agent interacting with the real-world environment, and the action selected to be performed by the agent defines a physical action that causes the agent to move in the real-world environment. 8. The method of claim 1 , wherein: the nodes in the graph that represents the interaction history of the agent with the environment represent every possible state of the environment; and each node in the graph is associated with data that indicates whether the environment previously transitioned into the state represented by the node as a result of the agent performing previously selected actions during the interaction of the agent with the environment. 9. The method of claim 8 , wherein the environment is a software environment defined by a set of program code, each state of the software environment corresponds to execution of a respective element of the set of program code, and the action selected to be performed by the agent defines an input to be provided to the software environment. 10. The method of claim 1 , wherein combining the encoded representations of the nodes in the interaction history graph comprises summing the encoded representations of the nodes in the interaction history graph. 11. The method of claim 10 , wherein summing the encoded representations of the nodes in the interaction history graph comprises: determining a respective weight factor for the encoded representation of each node in the graph; and scaling the encoded representation of each node in the graph using the respective weight factor prior to summing the encoded representations of the nodes in the interaction history graph. 12. The method of claim 1 , wherein iteratively updating the encoded representations of the nodes in the interaction history graph over the plurality of update iterations, comprises, at each update iteration after a first iteration of the plurality of update iterations: for each given node of the graph, updating the encoded representation of the node at the current iteration based on the encoded representations of a set of neighboring nodes of the given node in the graph at a previous iteration in accordance with values of the set of graph neural network parameters, wherein the set of neighboring nodes of the given node in the graph comprises: (i) the given node, and (ii) each other node in the graph that is connected to the given node by an edge of the graph; and determining the encoded representation of each node of the interaction history graph as the encoded representation of the node after a last iteration of the plurality of update iterations. 13. The method of claim 12 , further comprising: determining the encoded representation of each node of the interaction history graph at the first iteration of the plurality of update iterations based on characteristics of the state of the environment represented by the node. 14. The method of claim 1 , wherein the current values of the action selection neural network parameters are determined during interaction of the agent with a previous environment and are not adjusted during the interaction of the agent with the environment. 15. The method of claim 1 , wherein the input to the action selection neural network is based on: (i) the encoded representation of the graph, and (ii) encoded representations of one or more previous graphs, wherein each previous graph represents an interaction history of the agent with the environment as of a respective previous time step. 16. The method of claim 15 , further comprising: using a recurrent neural network to process an input compri

Assignees

Deepmind Tech Ltd

Inventors

Classifications

G06N3/092
Reinforcement learning · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N5/022
Knowledge engineering; Knowledge acquisition · CPC title
G06N3/006Primary
based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title
G06F18/2148
characterised by the process organisation or structure, e.g. boosting cascade · CPC title

Patent family

Related publications grouped by family.

View patent family 69192064

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11636347B2 cover?: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent interacting with an environment. In one aspect, a method comprises: obtaining a graph of nodes and edges that represents an interaction history of the agent with the environment; generating an encoded representation of the graph representing the in…
Who is the assignee on this patent?: Deepmind Tech Ltd
What technology area does this patent fall under?: Primary CPC classification G06N3/006. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 25 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).