Artificial intelligence-based hierarchical planning for manned/unmanned platforms
US-2023394294-A1 · Dec 7, 2023 · US
US12568037B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12568037-B2 |
| Application number | US-202117529751-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 18, 2021 |
| Priority date | Nov 18, 2021 |
| Publication date | Mar 3, 2026 |
| Grant date | Mar 3, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure provides a deep reinforcement learning (DRL) based dynamic network traffic management system including a LAN router, a plurality of WAN routers, a network switch, and a GNAT controller configured to measure one or more traffic states of a plurality of data flows, obtain an expected reward at the current time point, obtain the one or more traffic states to input to a DRL model to provide an expected reward of each data flow estimated for a next time point, obtain a target reward at the current time point, adjust parameters of the DRL model, predict a plurality of long-term rewards using the trained DRL model, select one of the plurality of long-term rewards, and adjust the bandwidth assigned to each data flow based on the selected long-term reward.
Opening claim text (preview).
What is claimed is: 1 . A deep reinforcement learning (DRL) based dynamic network traffic management (DNTM) system comprising: a local area network (LAN) router; a plurality of wireless area network (WAN) routers; a network switch; and a global network access terminal (GNAT) controller, configured to: measure one or more traffic states of a plurality of data flows at a current time point; obtain an expected reward at the current time point; input the one or more traffic states to a DRL model to provide an expected reward of each data flow estimated for a next time point; obtain a target reward at the current time point using the expected reward at the next time point; adjust parameters of the DRL model by minimizing a difference between the expected reward at the current time point and the target reward at the current time point to obtain a trained DRL model; predict a plurality of long-term rewards using the trained DRL model with different bandwidth assignments, a long-term reward representing a total contribution of bandwidth assigned to each data flow in the one or more traffic states in a future; select a maximum long-term reward from the plurality of long-term rewards; and adjust the bandwidth assigned to each data flow based on the selected long-term reward. 2 . The system according to claim 1 , wherein the GNAT controller is further configured to measure the one or more traffic states of the plurality of data flows periodically. 3 . The system according to claim 1 , wherein the DRL model includes a deep neural network (DNN) to provide an expected reward of each data flow estimated for the next time point. 4 . The system according to claim 3 , wherein parameters of the DNN are adjusted by minimizing the difference between the expected reward at the current time point and the target reward at the current time point. 5 . The system according to claim 1 , wherein the traffic state of each data flow includes traffic delay and data rate information. 6 . The system according to claim 5 , wherein the expected reward of each data flow is defined as: R t j = - ξ ( max 1 ≤ i ≤ N { S t [ i , j , 1 ] } - D [ j ] ) + D [ j ] + ( 1 - ξ ) ( ∑ i = 1 N S t [ i , j , 2 ] - C [ j ] ) - C [ j ] where R t j represents the expected reward evaluated based on the traffic state S t , S t [i, j, 1] represents an average traffic delay of data flow j on soft flow i from time point t−1 to t, S t [i, j, 2] represents an average data rate of data flow j on soft flow i from time point t−1 to t, D[j] represents a packet delay required by data flow j, C[j] represents a data rate required by data flow j, ξ∈(0,1) indicates a relative importance between the packet delay required by data flow j and the data rate required data flow j. 7 . The system according to claim 1 , wherein the GNAT controller is further configured to update the target reward at the current time point by: {circumflex over (Q)} ( S t ,A t )← R t+1 +γ{circumflex over (Q)} ( S t+1 ,A t+1 ) where {circumflex over (Q)}(S t , A t ) represents the target reward at time point t, {circumflex over (Q)}(S t+1 , A t+1 ) represents the target reward at time point t+1, R t+1 represent the expected reward at time point t+1, γ is a coefficient. 8 . The system according to claim 1 , wherein the GNAT controller is configured to adjust the bandwidth assigned to each data flow by controlling a transmission rate. 9 . A deep reinforcement learning (DRL) based dynamic network traffic management (DNTM) method for communication between a local area network (LAN) router and a plurality of wireless area router (WAN) routers, comprising: measuring one or more traffic states of a plurality of data flows at a current time; obtaining an expected reward at the current time point; obtaining the one or more traffic states from a global network access terminal (GNAT) router to input to a DRL model to provide an expected reward of each
for predicting network behaviour · CPC title
using machine learning or artificial intelligence · CPC title
Learning-based routing, e.g. using neural networks or artificial intelligence · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.