Optimization of planning trajectories for multiple agents
US-2022204055-A1 · Jun 30, 2022 · US
US12124282B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12124282-B2 |
| Application number | US-202117923114-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 13, 2021 |
| Priority date | Oct 18, 2021 |
| Publication date | Oct 22, 2024 |
| Grant date | Oct 22, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present invention discloses an intention-driven reinforcement learning-based path planning method, including the following steps: 1: acquiring, by a data collector, a state of a monitoring network; 2: selecting a steering angle of the data collector according to positions of surrounding obstacles, sensor nodes, and the data collector; 3: selecting a speed of the data collector, a target node, and a next target node as an action of the data collector according to an ε greedy policy; 4: determining, by the data collector, the next time slot according to the selected steering angle and speed; 5: obtaining rewards and penalties according to intentions of the data collector and the sensor nodes, and updating a Q value; 6: repeating step 1 to step 5 until a termination state or a convergence condition is satisfied; and 7: selecting, by the data collector, an action in each time slot having the maximum Q value as a planning result, and generating an optimal path. The method provided in the present invention can complete the data collection path planning with a higher probability of success and performance closer to the intention.
Opening claim text (preview).
What is claimed is: 1. An intention-driven reinforcement learning-based path planning method, comprising the following steps: step A: acquiring, by a data collector, a state of a monitoring network; step B: determining a steering angle of the data collector according to positions of the data collector, sensor nodes, and surrounding obstacles; step C: selecting an action of the data collector according to an ε greedy policy, wherein the action comprises a speed of the data collector, a target node, and a next target node; step D: adjusting, by the data collector, a direction of sailing according to the steering angle and executing the action to the next time slot; step E: calculating rewards and penalties according to intentions of the data collector and the sensor nodes, and updating a Q value; step F: repeating step A to step E until the monitoring network reaches a termination state or Q-learning satisfies a convergence condition; and step G: selecting, by the data collector, an action in each time slot having the maximum Q value as a planning result, and generating an optimal data collection path. 2. The intention-driven reinforcement learning-based path planning method according to claim 1 , wherein the state S of the monitoring network in step A comprises: a direction of sailing φ[n] of the data collector in a time slot n, coordinates q u [n] of the data collector, available storage space {b am [n]} m∈M of the sensor nodes, data collection indicators {w m [n]} m∈M of the sensor nodes, distances {d um [n]} m∈M between the data collector and the sensor nodes, and {d uk [n]} k∈K distances between the data collector and the surrounding obstacles, wherein M is the set of sensor nodes, K is the set of surrounding obstacles, w m [n]∈{0,1} is a data collection indicator of the sensor node m, and w m [n]=1 indicates that the data collector completes the data collection of the sensor node m in the time slot n, or otherwise indicates that the data collection is not completed. 3. The intention-driven reinforcement learning-based path planning method according to claim 1 , wherein a formula for calculating the steering angle of the data collector in step B is: Δ [ n + 1 ] = { min ( φ up [ n ] - φ [ n ] , φ max ) , φ up [ n ] ≥ φ [ n ] max ( φ up [ n ] - φ [ n ] , - φ max ) , φ up [ n ] < φ [ n ] , ( 11 ) φ up [n] is a relative angle between the coordinates q u [n] of the data collector and a target position p[n], and φ max is the maximum steering angle of the data collector. 4. The intention-driven reinforcement learning-based path planning method according to claim 3 , wherein steps of determining the target position in step B comprise: step B 1 : determining whether the
specially adapted for water-borne vessels · CPC title
for information gathering, e.g. for academic research · CPC title
Oceans · CPC title
Water vehicles · CPC title
using machine learning, e.g. neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.