Systems and methods for model-based meta-learning

US2024119308A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2024119308-A1
Application numberUS-202318159036-A
CountryUS
Kind codeA1
Filing dateJan 24, 2023
Priority dateSep 28, 2022
Publication dateApr 11, 2024
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments provide a method for predicting agent actions for neural network based agents according to an intervention. The method includes obtaining a first agent action at a first time step and a first intervention generated according to an intervention policy. The method also includes generating, by the neural network based agent model, a predicted agent action conditioned on the first agent action and the first intervention. The method also includes generating, by a neural network based intervention model, a second intervention according to the intervention policy and conditioned on the first agent action, the first intervention, and the predicted agent action. The method further includes executing a second agent action according to an agent policy that incurs a reward based on the second intervention. The method further includes training the neural network based intervention model by updating parameters of the neural network based intervention model based on an expected return.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for predicting agent actions for a plurality of neural network based agents according to an intervention input, the method comprising: obtaining a first agent action at a first time step and a first intervention that is generated according to an intervention policy at the first time step; generating, by a neural network based agent model, a predicted agent action at a second time step conditioned on the first agent action, and the first intervention at the first time step; generating, by a neural network based intervention model, a second intervention at the second time step according to the intervention policy and conditioned on the first agent action, the first intervention, and the predicted agent action; executing a second agent action according to an agent policy at the second time step that incurs a reward that is based on the second intervention at the second time step; and training the neural network based intervention model by updating parameters of the neural network based intervention model based on a first expected return computed based on incurred rewards over a plurality of time steps. 2 . The method of claim 1 , wherein the first expected return is computed based on the incurred rewards and intervention costs associated with the interventions over the plurality of time steps. 3 . The method of claim 1 , further comprising: updating the agent policy by maximizing a second expected return computed based on incurred rewards including the incurred reward at the second time step, prior to the updating of the parameters of the neural network based intervention model. 4 . The method of claim 1 , wherein the second agent action is determined by sampling the second agent action according to the agent policy at the second time step. 5 . The method of claim 1 , wherein the generating of the second intervention at the second time step includes: generating, by the neural network based intervention model, a distribution over interventions; and sampling the second intervention according to the generated distribution. 6 . The method of claim 1 , wherein the updating of the parameters of the neural network based intervention model is performed at an end of the plurality of time steps including the first time step and the second time step. 7 . The method of claim 1 , further comprising, after training the neural network based intervention model: generating, by the neural network based agent model, a second predicted agent action at a fourth time step conditioned on a third agent action at a third time step, and a third intervention at the third time step, the third time step being after the plurality of time steps; generating, by a neural network based intervention model, a fourth intervention at the fourth time step according to the intervention policy after the plurality of time steps and conditioned on the third agent action, the third intervention, and the second predicted agent action; executing a fourth agent action at the fourth time step that incurs a reward that is based on the fourth intervention at the fourth time step; collecting a rollout including the fourth agent action, the fourth intervention, and an intervention distribution after the plurality of time steps; and training the neural network based intervention model by updating parameters of the neural network based intervention model based on collected rollouts over a second plurality of time steps. 8 . The method of claim 7 , further comprising training the neural network based agent model by maximizing a log-likelihood of expected agent actions over the first or second plurality of time steps. 9 . A system for predicting agent actions for a plurality of neural network based agents according to an intervention input, the system comprising: a memory that stores a neural network based agent model and a neural network based intervention model, and a plurality of processor executable instructions; a communication interface that receives a first agent action at a first time step and a first intervention that is generated according to an intervention policy at the first time step; and one or more hardware processors that read and execute the plurality of processor-executable instructions from the memory to perform operations comprising: generating, by a neural network based agent model, a predicted agent action at a second time step conditioned on the first agent action, and the first intervention at the first time step; generating, by a neural network based intervention model, a second intervention at the second time step according to the intervention policy and conditioned on the first agent action, the first intervention, and the predicted agent action; executing a second agent action according to an agent policy at the second time step that incurs a reward that is based on the second intervention at the second time step; and training the neural network based intervention model by updating parameters of the neural network based intervention model based on a first expected return computed based on incurred rewards over a plurality of time steps. 10 . The system of claim 9 , wherein the first expected return is computed based on the incurred rewards and intervention costs associated with the interventions over the plurality of time steps. 11 . The system of claim 9 , wherein the operations further comprise: updating the agent policy by maximizing a second expected return computed based on incurred rewards including the incurred reward at the second time step, prior to the updating of the parameters of the neural network based intervention model. 12 . The system of claim 9 , wherein the second agent action is determined by sampling the second agent action according to the agent policy at the second time step. 13 . The system of claim 9 , wherein the generating of the second intervention at the second time step includes: generating, by the neural network based intervention model, a distribution over interventions; and sampling the second intervention according to the generated distribution. 14 . The system of claim 9 , wherein the updating of the parameters of the neural network based intervention model is performed at an end of the plurality of time steps including the first time step and the second time step. 15 . The system of claim 9 , wherein the operations further comprise, after training the neural network based intervention model: generating, by the neural network based agent model, a second predicted agent action at a fourth time step conditioned on a third agent action at a third time step, and a third intervention at the third time step, the third time step being after the plurality of time steps; generating, by a neural network based intervention model, a fourth intervention at the fourth time step according to the intervention policy after the plurality of time steps and conditioned on the third agent action, the third intervention, and the second predicted agent action; executing a fourth agent action at the fourth time step that incurs a reward that is based on the fourth intervention at the fourth time step; collecting a rollout including the fourth agent action, the fourth intervention, and an intervention distribution after the plurality of time steps; and training the neural network based intervention model by updating parameters of the neural network based intervention model based on collected rollouts over a second plurality of time steps. 16 . The system of claim 15 , wherein the operations further comprise training t

Assignees

Inventors

Classifications

  • G06N3/0985Primary

    Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title

  • G06N3/006Primary

    based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title

  • Combinations of networks · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

  • Reinforcement learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024119308A1 cover?
Embodiments provide a method for predicting agent actions for neural network based agents according to an intervention. The method includes obtaining a first agent action at a first time step and a first intervention generated according to an intervention policy. The method also includes generating, by the neural network based agent model, a predicted agent action conditioned on the first agent…
Who is the assignee on this patent?
Salesforce Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/0985. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 11 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).