Navigation Based on Liability Constraints
US-2019291727-A1 · Sep 26, 2019 · US
US11657266B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11657266-B2 |
| Application number | US-201816193291-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 16, 2018 |
| Priority date | Nov 16, 2018 |
| Publication date | May 23, 2023 |
| Grant date | May 23, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
According to one aspect, cooperative multi-goal, multi-agent, multi-stage (CM3) reinforcement learning may include training a first agent using a first policy gradient and a first critic using a first loss function to learn goals in a single-agent environment using a Markov decision process, training a number of agents based on the first policy gradient and a second policy gradient and a second critic based on the first loss function and a second loss function to learn cooperation between the agents in a multi-agent environment using a Markov game to instantiate a second agent neural network, each of the agents instantiated with the first agent neural network in a pre-trained fashion, and generating a CM3 network policy based on the first agent neural network and the second agent neural network. The CM3 network policy may be implemented in a CM3 based autonomous vehicle to facilitate autonomous driving.
Opening claim text (preview).
The invention claimed is: 1. A method for cooperative multi-goal, multi-agent, multi-stage (CM3) reinforcement learning, comprising: training a first agent based on a first policy gradient and training a first critic based on a first loss function to learn one or more goals in a single-agent environment using a Markov decision process, wherein the first agent is associated with a first agent neural network and the first critic is associated with a first critic neural network; training a number of N agents based on the first policy gradient and training a second policy gradient and a second critic based on the first loss function and a second loss function to learn cooperation between the N agents in a multi-agent environment using a Markov game to instantiate a second agent neural network, wherein each of the N agents is instantiated with the first agent neural network in a pre-trained fashion, wherein the number of N agents are of the same type as the first agent; generating a cooperative multi-goal, multi-agent, multi-stage network policy based on the first agent neural network and the second agent neural network; and operating an autonomous vehicle in an autonomous fashion based on the cooperative multi-goal, multi-agent, multi-stage network policy. 2. The method for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 1 , wherein the first critic is a decentralized critic. 3. The method for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 1 , wherein the second critic is a centralized critic. 4. The method for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 1 , wherein training the first agent in the single-agent environment occurs prior to training the N agents in the multi-agent environment. 5. The method for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 1 , comprising training the number of N agents based on a combined policy gradient derived from the first policy gradient and the second policy gradient. 6. The method for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 1 , wherein the second agent neural network is associated with an o others parameter for each of the N agents indicative of a local observation of each of the corresponding N agents. 7. The method for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 6 , wherein the o others parameter is indicative of a velocity of the first agent, a number of lanes or sub-lanes between the first agent and one of the N agents, a distance from the first agent to a goal position, or a vehicle type associated with the first agent. 8. The method for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 6 , wherein the o others parameter is indicative of a vehicle occupancy status associated with one of the N agents, a relative velocity of one of the N agents relative to the first agent, or a vehicle type associated with one of the N agents. 9. The method for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 1 , wherein the number of N agents includes the first agent. 10. The method for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 1 , wherein training the first agent and training the number of N agents includes generating one or more actions including a no-operation action, an acceleration action, a deceleration action, a shift left one sub-lane action, and a shift right one sub-lane action. 11. A system for cooperative multi-goal, multi-agent, multi-stage (CM3) reinforcement learning, comprising: a processor; a memory; and a simulator implemented via the processor and memory, performing: training a first agent based on a first policy gradient and training a first critic based on a first loss function to learn one or more goals in a single-agent environment using a Markov decision process, wherein the first agent is associated with a first agent neural network and the first critic is associated with a first critic neural network; training a number of N agents based on the first policy gradient and a second policy gradient and training a second critic based on the first loss function and a second loss function to learn cooperation between the N agents in a multi-agent environment using a Markov game to instantiate a second agent neural network, wherein each of the N agents is instantiated with the first agent neural network in a pre-trained fashion, wherein the number of N agents are of the same type as the first agent; and generating a cooperative multi-goal, multi-agent, multi-stage network policy based on the first agent neural network and the second agent neural network, wherein an autonomous vehicle is operated in an autonomous fashion based on the cooperative multi-goal, multi-agent, multi-stage network policy. 12. The system for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 11 , wherein the first critic is a decentralized critic and the second critic is a centralized critic. 13. The system for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 11 , wherein the simulator trains the first agent in the single-agent environment prior to training the N agents in the multi-agent environment. 14. The system for cooperative multi-goal, multi-agent, multi-stage reinforcement learning of claim 11 , wherein the second agent neural network is associated with an o others parameter for each of the N agents indicative of a local observation of each of the corresponding N agents. 15. A cooperative multi-goal, multi-agent, multi-stage (CM3) reinforcement learning based autonomous vehicle, comprising: a storage device storing a cooperative multi-goal, multi-agent, multi-stage network policy; and a controller operating the autonomous vehicle in an autonomous fashion based on the cooperative multi-goal, multi-agent, multi-stage network policy, wherein the cooperative multi-goal, multi-agent, multi-stage network policy is generated based on a first agent neural network and a second agent neural network, wherein a first agent is trained based on a first policy gradient and a first critic trained based on a first loss function to learn one or more goals in a single-agent environment using a Markov decision process, wherein the first agent is associated with the first agent neural network and the first critic is associated with a first critic neural network; and wherein a number of N agents are trained based on the first policy gradient and a second policy gradient and a second critic trained based on the first loss function and a second loss function to learn cooperation between the N agents in a multi-agent environment using a Markov game to instantiate the second agent neural network, wherein each of the N agents is instantiated with the first agent neural network in a pre-trained fashion, wherein the number of N agents are of the same type as the first agent. 16. The cooperative multi-goal, multi-agent, multi-stage reinforcement learning based autonomous vehicle of claim 15 , wherein the second agent neural network is associated with an o others parameter for each of the N agents indicative of a local observation of each of the corresponding N agents. 17. The cooperative multi-goal, multi-agent, multi-stage reinforcement learning based autonomous vehicle of claim 16 , wherein the o others parameter is indicative of a velocity of the first agent, a number of lanes or sub-lanes between
Combinations of networks · CPC title
Reinforcement learning · CPC title
characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours (using knowledge based models G06N5/00) · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Transfer learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.