Systems and methods for performing behavior detection and behavioral intervention
US-2024424245-A1 · Dec 26, 2024 · US
US2024119308A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2024119308-A1 |
| Application number | US-202318159036-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jan 24, 2023 |
| Priority date | Sep 28, 2022 |
| Publication date | Apr 11, 2024 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments provide a method for predicting agent actions for neural network based agents according to an intervention. The method includes obtaining a first agent action at a first time step and a first intervention generated according to an intervention policy. The method also includes generating, by the neural network based agent model, a predicted agent action conditioned on the first agent action and the first intervention. The method also includes generating, by a neural network based intervention model, a second intervention according to the intervention policy and conditioned on the first agent action, the first intervention, and the predicted agent action. The method further includes executing a second agent action according to an agent policy that incurs a reward based on the second intervention. The method further includes training the neural network based intervention model by updating parameters of the neural network based intervention model based on an expected return.
Opening claim text (preview).
What is claimed is: 1 . A method for predicting agent actions for a plurality of neural network based agents according to an intervention input, the method comprising: obtaining a first agent action at a first time step and a first intervention that is generated according to an intervention policy at the first time step; generating, by a neural network based agent model, a predicted agent action at a second time step conditioned on the first agent action, and the first intervention at the first time step; generating, by a neural network based intervention model, a second intervention at the second time step according to the intervention policy and conditioned on the first agent action, the first intervention, and the predicted agent action; executing a second agent action according to an agent policy at the second time step that incurs a reward that is based on the second intervention at the second time step; and training the neural network based intervention model by updating parameters of the neural network based intervention model based on a first expected return computed based on incurred rewards over a plurality of time steps. 2 . The method of claim 1 , wherein the first expected return is computed based on the incurred rewards and intervention costs associated with the interventions over the plurality of time steps. 3 . The method of claim 1 , further comprising: updating the agent policy by maximizing a second expected return computed based on incurred rewards including the incurred reward at the second time step, prior to the updating of the parameters of the neural network based intervention model. 4 . The method of claim 1 , wherein the second agent action is determined by sampling the second agent action according to the agent policy at the second time step. 5 . The method of claim 1 , wherein the generating of the second intervention at the second time step includes: generating, by the neural network based intervention model, a distribution over interventions; and sampling the second intervention according to the generated distribution. 6 . The method of claim 1 , wherein the updating of the parameters of the neural network based intervention model is performed at an end of the plurality of time steps including the first time step and the second time step. 7 . The method of claim 1 , further comprising, after training the neural network based intervention model: generating, by the neural network based agent model, a second predicted agent action at a fourth time step conditioned on a third agent action at a third time step, and a third intervention at the third time step, the third time step being after the plurality of time steps; generating, by a neural network based intervention model, a fourth intervention at the fourth time step according to the intervention policy after the plurality of time steps and conditioned on the third agent action, the third intervention, and the second predicted agent action; executing a fourth agent action at the fourth time step that incurs a reward that is based on the fourth intervention at the fourth time step; collecting a rollout including the fourth agent action, the fourth intervention, and an intervention distribution after the plurality of time steps; and training the neural network based intervention model by updating parameters of the neural network based intervention model based on collected rollouts over a second plurality of time steps. 8 . The method of claim 7 , further comprising training the neural network based agent model by maximizing a log-likelihood of expected agent actions over the first or second plurality of time steps. 9 . A system for predicting agent actions for a plurality of neural network based agents according to an intervention input, the system comprising: a memory that stores a neural network based agent model and a neural network based intervention model, and a plurality of processor executable instructions; a communication interface that receives a first agent action at a first time step and a first intervention that is generated according to an intervention policy at the first time step; and one or more hardware processors that read and execute the plurality of processor-executable instructions from the memory to perform operations comprising: generating, by a neural network based agent model, a predicted agent action at a second time step conditioned on the first agent action, and the first intervention at the first time step; generating, by a neural network based intervention model, a second intervention at the second time step according to the intervention policy and conditioned on the first agent action, the first intervention, and the predicted agent action; executing a second agent action according to an agent policy at the second time step that incurs a reward that is based on the second intervention at the second time step; and training the neural network based intervention model by updating parameters of the neural network based intervention model based on a first expected return computed based on incurred rewards over a plurality of time steps. 10 . The system of claim 9 , wherein the first expected return is computed based on the incurred rewards and intervention costs associated with the interventions over the plurality of time steps. 11 . The system of claim 9 , wherein the operations further comprise: updating the agent policy by maximizing a second expected return computed based on incurred rewards including the incurred reward at the second time step, prior to the updating of the parameters of the neural network based intervention model. 12 . The system of claim 9 , wherein the second agent action is determined by sampling the second agent action according to the agent policy at the second time step. 13 . The system of claim 9 , wherein the generating of the second intervention at the second time step includes: generating, by the neural network based intervention model, a distribution over interventions; and sampling the second intervention according to the generated distribution. 14 . The system of claim 9 , wherein the updating of the parameters of the neural network based intervention model is performed at an end of the plurality of time steps including the first time step and the second time step. 15 . The system of claim 9 , wherein the operations further comprise, after training the neural network based intervention model: generating, by the neural network based agent model, a second predicted agent action at a fourth time step conditioned on a third agent action at a third time step, and a third intervention at the third time step, the third time step being after the plurality of time steps; generating, by a neural network based intervention model, a fourth intervention at the fourth time step according to the intervention policy after the plurality of time steps and conditioned on the third agent action, the third intervention, and the second predicted agent action; executing a fourth agent action at the fourth time step that incurs a reward that is based on the fourth intervention at the fourth time step; collecting a rollout including the fourth agent action, the fourth intervention, and an intervention distribution after the plurality of time steps; and training the neural network based intervention model by updating parameters of the neural network based intervention model based on collected rollouts over a second plurality of time steps. 16 . The system of claim 15 , wherein the operations further comprise training t
Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title
based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title
Combinations of networks · CPC title
Backpropagation, e.g. using gradient descent · CPC title
Reinforcement learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.