Graph neural network systems for behavior prediction and reinforcement learning in multple agent environments

US2021192358A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021192358-A1
Application numberUS-201917054632-A
CountryUS
Kind codeA1
Filing dateMay 20, 2019
Priority dateMay 18, 2018
Publication dateJun 24, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for predicting the actions of, or influences on, agents in environments with multiple agents, in particular for reinforcement learning. In one aspect, a relational forward model (RFM) system receives agent data representing agent actions for each of multiple agents and implements: an encoder graph neural network subsystem to process the agent data as graph data to provide encoded graph data, a recurrent graph neural network subsystem to process the encoded graph data to provide processed graph data, a decoder graph neural network subsystem to decode the processed graph data to provide decoded graph data and an output to provide representation data for node and/or edge attributes of the decoded graph data relating to a predicted action of one or more of the agents. A reinforcement learning system includes the RFM system.

First claim

Opening claim text (preview).

1 . A neural network system for predicting or explaining the actions of multiple agents in a shared environment, the neural network system comprising: one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to implement: an encoder graph neural network subsystem to process the agent data as graph data to provide encoded graph data, wherein the agent data represents agent actions for each of multiple agents; wherein the graph data comprises (i) data representing at least nodes and edges of a graph and (ii) node attributes for at least some of the nodes in the graph, wherein the nodes represent the agents and one or more non-agent entities in the environment, wherein the edges connect nodes in the graph, wherein the node attributes represent the agent actions of the agents, and wherein the encoded graph data comprises node attributes and edge attributes representing an updated version of the graph data; a recurrent graph neural network subsystem comprising a recurrent neural network to process the encoded graph data and provide processed graph data comprising an updated version of the node attributes and edge attributes of the encoded graph data; a decoder graph neural network subsystem to decode the processed graph data and provide decoded graph data comprising an updated version of the node attributes and edge attributes of the processed graph data; and a system output to provide representation data comprising a representation of one or both of the node attributes and edge attributes of the decoded graph data for one or more of the agents, wherein the representation relates to a predicted or explained action of one or more of the agents. 2 . A neural network system as claimed in claim 1 wherein the agent data representing agent actions comprises agent position and motion data for each of multiple agents, and wherein the node attributes for determining the actions of each agent further include attributes for the position and motion of each agent. 3 . A neural network system as claimed in claim 1 wherein each of the agents is connected to each of the other agents by an edge and wherein each of the non-agent entities is connected to each of the agents by an edge. 4 . A neural network system as claimed in claim 1 wherein the system output comprises one or more output neural network layers to combine the node attributes for a node in the decoded graph data to output the representation data, and wherein the representation comprises a predicted action of the agent represented by the node. 5 . A neural network system as claimed in claim 4 wherein the representation data defines a spatial map of data derived from the node attributes of one or more nodes representing one or more of the agents and wherein, in the spatial map, the data derived from the node attributes is represented at or adjacent a position of the respective node.[p 9 note 2 ][action scores/logits] 6 . A neural network system as claimed in claim 1 wherein the representation data comprises a representation of the edge attributes of the decoded graph data for the edges connecting to one or more of the nodes, and wherein the representation of the edge attributes for an edge is determined from a combination of the edge attributes for the edge. 7 . A neural network system as claimed in claim 6 wherein the representation data defines a spatial map and wherein, in the spatial map, the representation of the edge attributes for an edge is represented at an origin node position for the edge. 8 . A neural network system as claimed in claim 1 wherein one or more of the encoder, processing, and decoder graph neural network subsystems is configured to: for each of the edges, process the edge features using an edge neural network to determine output edge features, for each of the nodes, aggregate the output edge features for edges connecting to the node to determine aggregated edge features for the node, and for each of the nodes, process the aggregated edge features and the node features using a node neural network to determine output node features. 9 . A neural network system as claimed in claim 8 wherein processing the edge features comprises, for each edge, providing the edge features and node features for the nodes connected by the edge to the edge neural network to determine the output edge features. 10 . A neural network system as claimed in claim 8 wherein one or more of the encoder, processing, and decoder graph neural network subsystems is further configured to determine a global feature vector using a global feature neural network, the global feature vector representing the output edge features and the output node features, and wherein a subsequent graph neural network subsystem is configured to process the global feature vector when determining the output edge features and output node features. 11 - 15 . (canceled) 16 . A method of predicting or explaining the actions of multiple agents in a shared environment, the method comprising: receiving agent data representing actions for each of multiple agents; processing the agent data as graph data to provide encoded graph data, wherein the graph data comprises data representing at least nodes and edges of a graph, wherein each of the agents is represented by a node, wherein non-agent entities in the environment are each represented by a node, wherein the nodes have node attributes for determining the actions of each agent, wherein the edges connect the agents to each other and to the non-agent entities, and wherein the encoded graph data comprises node attributes and edge attributes representing an updated version of the graph data; processing the encoded graph data using a recurrent graph neural network to provide processed graph data comprising an updated version of the node attributes and edge attributes of the encoded graph data; decoding the processed graph data to provide decoded graph data comprising an updated version of the node attributes and edge attributes of the processed graph data; and outputting a representation of one or both of the node attributes and edge attributes of the decoded graph data for one or more of the agents, wherein the representation relates to a predicted or explained behaviour of the agent. 17 . A method as claimed in claim 16 wherein the behaviours comprise actions of the agents, and wherein outputting the representation comprises processing the node attributes for a node of the decoded graph data to determine a predicted action of the agent represented by the node. 18 . A method as claimed in claim 16 for explaining the actions of the agents, wherein outputting the representation comprises processing the edge attributes of an edge of the decoded graph data connecting an influencing node to an agent node to determine data representing the importance of the influencing node to the agent node. 19 . (canceled) 20 . One or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to implement a system comprising: an encoder graph neural network subsystem to process the agent data as graph data to provide encoded graph data, wherein the agent data represents agent actions for each of multiple agents; wherein the graph data comprises (i) data representing at least nodes and edges of a graph and (ii) node attributes for at least some of the nodes in the graph, wherein the nodes represent the agents and one or more non-agent entities in th

Assignees

Inventors

Classifications

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Combinations of networks · CPC title

  • Supervised learning · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021192358A1 cover?
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for predicting the actions of, or influences on, agents in environments with multiple agents, in particular for reinforcement learning. In one aspect, a relational forward model (RFM) system receives agent data representing agent actions for each of multiple agents and implements: an encoder graph…
Who is the assignee on this patent?
Deepmind Tech Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/088. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 24 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).