Method for controlling and/or regulating a technical system in a computer-assisted manner
US-2015227121-A1 · Aug 13, 2015 · US
US11636347B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11636347-B2 |
| Application number | US-202016749252-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 22, 2020 |
| Priority date | Jan 23, 2019 |
| Publication date | Apr 25, 2023 |
| Grant date | Apr 25, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting actions to be performed by an agent interacting with an environment. In one aspect, a method comprises: obtaining a graph of nodes and edges that represents an interaction history of the agent with the environment; generating an encoded representation of the graph representing the interaction history of the agent with the environment; processing an input based on the encoded representation of the graph using an action selection neural network, in accordance with current values of action selection neural network parameters, to generate an action selection output; and selecting an action from a plurality of possible actions to be performed by the agent using the action selection output generated by the action selection neural network.
Opening claim text (preview).
What is claimed is: 1. A method performed by one or more data processing apparatus for selecting actions to be performed by an agent interacting with an environment, the method comprising: obtaining data defining an interaction history graph of nodes and edges that represents an interaction history of the agent with the environment, wherein: each node in the interaction history graph represents a possible state of the environment, each edge in the interaction history graph connects a respective pair of nodes in the interaction history graph, and an edge in the interaction history graph connects a pair of nodes in the interaction history graph only if the state of the environment can transition from one of the nodes in the pair of nodes to the other node in the pair of nodes; processing data defining the interaction history graph representing the interaction history of the agent with the environment using a graph neural network to generate an encoded representation of the interaction history graph, comprising: obtaining a respective encoded representation of each node in the interaction history graph; iteratively updating the encoded representations of the nodes in the interaction history graph, over a plurality of update iterations, in accordance with values of a set of graph neural network parameters; and generating the encoded representation of the interaction history graph by combining the encoded representations of the nodes in the interaction history graph; processing an input based on the encoded representation of the interaction history graph using an action selection neural network, in accordance with current values of action selection neural network parameters, to generate an action selection output; and selecting an action from a plurality of possible actions to be performed by the agent using the action selection output generated by the action selection neural network. 2. The method of claim 1 , further comprising: identifying one or more new states of the environment, wherein: (i) the state of the environment transitions into the one or more new states as a result of the agent performing the selected action, and (ii) the state of the environment did not previously transition into any of the new states as a result of the agent performing previously selected actions during the interaction of the agent with the environment; determining a reward based on the new states of the environment; and adjusting the current values of the action selection neural network parameters based on the reward using a reinforcement learning technique. 3. The method of claim 2 , wherein determining the reward based on the new states of the environment comprises: determining the reward based on the number of new states of the environment. 4. The method of claim 1 , wherein each node in the graph that represents the interaction history of the agent with the environment corresponds to a state of the environment that the environment previously transitioned into as a result of the agent performing previously selected actions during the interaction of the agent with the environment. 5. The method of claim 4 , wherein each edge in the graph connects a pair of nodes in the graph only if the state of the environment previously transitioned from one of the nodes in the pair of nodes to the other node in the pair of nodes as a result of the agent performing previously selected actions during the interaction of the agent with the environment. 6. The method of claim 4 , wherein the environment is a software environment, each state of the software environment corresponds to a respective state of a user interface of the software environment, and the action selected to be performed by the agent defines a particular interaction with the user interface of the software environment. 7. The method of claim 4 , wherein the environment is a real-world environment, each state of the real-world environment corresponds to a respective spatial position in the real-world environment, the agent is a robotic agent interacting with the real-world environment, and the action selected to be performed by the agent defines a physical action that causes the agent to move in the real-world environment. 8. The method of claim 1 , wherein: the nodes in the graph that represents the interaction history of the agent with the environment represent every possible state of the environment; and each node in the graph is associated with data that indicates whether the environment previously transitioned into the state represented by the node as a result of the agent performing previously selected actions during the interaction of the agent with the environment. 9. The method of claim 8 , wherein the environment is a software environment defined by a set of program code, each state of the software environment corresponds to execution of a respective element of the set of program code, and the action selected to be performed by the agent defines an input to be provided to the software environment. 10. The method of claim 1 , wherein combining the encoded representations of the nodes in the interaction history graph comprises summing the encoded representations of the nodes in the interaction history graph. 11. The method of claim 10 , wherein summing the encoded representations of the nodes in the interaction history graph comprises: determining a respective weight factor for the encoded representation of each node in the graph; and scaling the encoded representation of each node in the graph using the respective weight factor prior to summing the encoded representations of the nodes in the interaction history graph. 12. The method of claim 1 , wherein iteratively updating the encoded representations of the nodes in the interaction history graph over the plurality of update iterations, comprises, at each update iteration after a first iteration of the plurality of update iterations: for each given node of the graph, updating the encoded representation of the node at the current iteration based on the encoded representations of a set of neighboring nodes of the given node in the graph at a previous iteration in accordance with values of the set of graph neural network parameters, wherein the set of neighboring nodes of the given node in the graph comprises: (i) the given node, and (ii) each other node in the graph that is connected to the given node by an edge of the graph; and determining the encoded representation of each node of the interaction history graph as the encoded representation of the node after a last iteration of the plurality of update iterations. 13. The method of claim 12 , further comprising: determining the encoded representation of each node of the interaction history graph at the first iteration of the plurality of update iterations based on characteristics of the state of the environment represented by the node. 14. The method of claim 1 , wherein the current values of the action selection neural network parameters are determined during interaction of the agent with a previous environment and are not adjusted during the interaction of the agent with the environment. 15. The method of claim 1 , wherein the input to the action selection neural network is based on: (i) the encoded representation of the graph, and (ii) encoded representations of one or more previous graphs, wherein each previous graph represents an interaction history of the agent with the environment as of a respective previous time step. 16. The method of claim 15 , further comprising: using a recurrent neural network to process an input compri
Reinforcement learning · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Knowledge engineering; Knowledge acquisition · CPC title
based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title
characterised by the process organisation or structure, e.g. boosting cascade · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.