Cooperative multi-goal, multi-agent, multi-stage reinforcement learning

US11657266B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11657266-B2
Application numberUS-201816193291-A
CountryUS
Kind codeB2
Filing dateNov 16, 2018
Priority dateNov 16, 2018
Publication dateMay 23, 2023
Grant dateMay 23, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to one aspect, cooperative multi-goal, multi-agent, multi-stage (CM3) reinforcement learning may include training a first agent using a first policy gradient and a first critic using a first loss function to learn goals in a single-agent environment using a Markov decision process, training a number of agents based on the first policy gradient and a second policy gradient and a second critic based on the first loss function and a second loss function to learn cooperation between the agents in a multi-agent environment using a Markov game to instantiate a second agent neural network, each of the agents instantiated with the first agent neural network in a pre-trained fashion, and generating a CM3 network policy based on the first agent neural network and the second agent neural network. The CM3 network policy may be implemented in a CM3 based autonomous vehicle to facilitate autonomous driving.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for cooperative multi-goal, multi-agent, multi-stage (CM3) reinforcement learning, comprising: training a first agent based on a first policy gradient and training a first critic based on a first loss function to learn one or more goals in a single-agent environment using a Markov decision process, wherein the first agent is associated with a first agent neural network and the first critic is associated with a first critic neural network; training a number of N agents based on the first policy gradient and training a second policy gradient and a second critic based on the first loss function and a second loss function to learn cooperation between the N agents in a multi-agent environment using a Markov game to instantiate a second agent neural network, wherein each of the N agents is instantiated with the first agent neural network in a pre-trained fashion, wherein the number of N agents are of the same type as the first agent; generating a cooperative multi-goal, multi-agent, multi-stage network policy based on the first agent neural network and the second agent neural network; and operating an autonomous vehicle in an autonomous fashion based on the cooperative multi-goal, multi-agent, multi-stage network policy. 2. The method for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 1 , wherein the first critic is a decentralized critic. 3. The method for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 1 , wherein the second critic is a centralized critic. 4. The method for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 1 , wherein training the first agent in the single-agent environment occurs prior to training the N agents in the multi-agent environment. 5. The method for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 1 , comprising training the number of N agents based on a combined policy gradient derived from the first policy gradient and the second policy gradient. 6. The method for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 1 , wherein the second agent neural network is associated with an o others parameter for each of the N agents indicative of a local observation of each of the corresponding N agents. 7. The method for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 6 , wherein the o others parameter is indicative of a velocity of the first agent, a number of lanes or sub-lanes between the first agent and one of the N agents, a distance from the first agent to a goal position, or a vehicle type associated with the first agent. 8. The method for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 6 , wherein the o others parameter is indicative of a vehicle occupancy status associated with one of the N agents, a relative velocity of one of the N agents relative to the first agent, or a vehicle type associated with one of the N agents. 9. The method for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 1 , wherein the number of N agents includes the first agent. 10. The method for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 1 , wherein training the first agent and training the number of N agents includes generating one or more actions including a no-operation action, an acceleration action, a deceleration action, a shift left one sub-lane action, and a shift right one sub-lane action. 11. A system for cooperative multi-goal, multi-agent, multi-stage (CM3) reinforcement learning, comprising: a processor; a memory; and a simulator implemented via the processor and memory, performing: training a first agent based on a first policy gradient and training a first critic based on a first loss function to learn one or more goals in a single-agent environment using a Markov decision process, wherein the first agent is associated with a first agent neural network and the first critic is associated with a first critic neural network; training a number of N agents based on the first policy gradient and a second policy gradient and training a second critic based on the first loss function and a second loss function to learn cooperation between the N agents in a multi-agent environment using a Markov game to instantiate a second agent neural network, wherein each of the N agents is instantiated with the first agent neural network in a pre-trained fashion, wherein the number of N agents are of the same type as the first agent; and generating a cooperative multi-goal, multi-agent, multi-stage network policy based on the first agent neural network and the second agent neural network, wherein an autonomous vehicle is operated in an autonomous fashion based on the cooperative multi-goal, multi-agent, multi-stage network policy. 12. The system for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 11 , wherein the first critic is a decentralized critic and the second critic is a centralized critic. 13. The system for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 11 , wherein the simulator trains the first agent in the single-agent environment prior to training the N agents in the multi-agent environment. 14. The system for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 11 , wherein the second agent neural network is associated with an o others parameter for each of the N agents indicative of a local observation of each of the corresponding N agents. 15. A cooperative multi-goal, multi-agent, multi-stage (CM3) reinforcement learning based autonomous vehicle, comprising: a storage device storing a cooperative multi-goal, multi-agent, multi-stage network policy; and a controller operating the autonomous vehicle in an autonomous fashion based on the cooperative multi-goal, multi-agent, multi-stage network policy, wherein the cooperative multi-goal, multi-agent, multi-stage network policy is generated based on a first agent neural network and a second agent neural network, wherein a first agent is trained based on a first policy gradient and a first critic trained based on a first loss function to learn one or more goals in a single-agent environment using a Markov decision process, wherein the first agent is associated with the first agent neural network and the first critic is associated with a first critic neural network; and wherein a number of N agents are trained based on the first policy gradient and a second policy gradient and a second critic trained based on the first loss function and a second loss function to learn cooperation between the N agents in a multi-agent environment using a Markov game to instantiate the second agent neural network, wherein each of the N agents is instantiated with the first agent neural network in a pre-trained fashion, wherein the number of N agents are of the same type as the first agent. 16. The cooperative multi-goal, multi-agent, multi-stage reinforcement learning based autonomous vehicle of claim 15 , wherein the second agent neural network is associated with an o others parameter for each of the N agents indicative of a local observation of each of the corresponding N agents. 17. The cooperative multi-goal, multi-agent, multi-stage reinforcement learning based autonomous vehicle of claim 16 , wherein the o others parameter is indicative of a velocity of the first agent, a number of lanes or sub-lanes between

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • G06N3/092Primary

    Reinforcement learning · CPC title

  • characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours (using knowledge based models G06N5/00) · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Transfer learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11657266B2 cover?
According to one aspect, cooperative multi-goal, multi-agent, multi-stage (CM3) reinforcement learning may include training a first agent using a first policy gradient and a first critic using a first loss function to learn goals in a single-agent environment using a Markov decision process, training a number of agents based on the first policy gradient and a second policy gradient and a second…
Who is the assignee on this patent?
Honda Motor Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/092. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 23 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).