Radio access network control with deep reinforcement learning

US11494649B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11494649-B2
Application numberUS-202016778031-A
CountryUS
Kind codeB2
Filing dateJan 31, 2020
Priority dateJan 31, 2020
Publication dateNov 8, 2022
Grant dateNov 8, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processing system including at least one processor may obtain operational data from a radio access network (RAN), format the operational data into state information and reward information for a reinforcement learning agent (RLA), processing the state information and the reward information via the RLA, where the RLA comprises a plurality of sub-agents, each comprising a respective neural network, each of the neural networks encoding a respective policy for selecting at least one setting of at least one parameter of the RAN to increase a respective predicted reward in accordance with the state information, and where each neural network is updated in accordance with the reward information. The processing system may further determine settings for parameters of the RAN via the RLA, where the RLA determines the settings in accordance with selections for the settings via the plurality of sub-agents, and apply the plurality of settings to the RAN.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: obtaining, by a processing system including at least one processor, operational data from a radio access network; formatting, by the processing system, the operational data into state information and reward information for a reinforcement learning agent; processing, by the processing system, the state information and the reward information via the reinforcement learning agent, wherein the reinforcement learning agent comprises a plurality of sub-agents, wherein each of the plurality of sub-agents comprises a respective neural network of a plurality of neural networks, wherein each of the plurality of neural networks encodes a respective policy for selecting at least one setting of at least one parameter of the radio access network to increase a respective predicted reward in accordance with the state information, wherein each of the plurality of neural networks is updated in accordance with the reward information; determining, by the processing system, a plurality of settings for a plurality of parameters of the radio access network via the reinforcement learning agent, wherein the reinforcement learning agent determines the plurality of settings in accordance with a plurality of selections for the plurality of settings via the plurality of sub-agents, wherein the plurality of settings includes the at least one setting and the plurality of parameters includes the at least one parameter; and applying, by the processing system, the plurality of settings to the radio access network. 2. The method of claim 1 , wherein the processing the state information and the reward information via the reinforcement learning agent comprising: updating the plurality of neural networks in accordance with the reward information. 3. The method of claim 1 , wherein the plurality of settings for the plurality of parameters comprises at least one of: a tilt angle of at least one antenna array of the radio access network; or a power level of the at least one antenna array of the radio access network. 4. The method of claim 1 , wherein the plurality of settings for the plurality of parameters comprises a media access control scheduling algorithm from among a plurality of available media access control scheduling algorithms. 5. The method of claim 1 , wherein the plurality of settings for the plurality of parameters comprises a handover offset setting from among a plurality of available handover offset settings. 6. The method of claim 1 , wherein the state information comprises a plurality of performance indicators that include at least two of: a throughput; an uplink volume; a downlink volume; a physical resource block utilization; a number of active endpoint devices; a handover frequency; an average endpoint device bandwidth; a geographic distribution of endpoint devices; a radio frequency distribution; or a traffic volume. 7. The method of claim 1 , wherein the reward information comprises at least one performance indicator, including at least one of: an endpoint device throughput; a harmonic user equipment throughput; a throughput differential among endpoint devices; a drop rate; a retainabilty metric; or an accessibility metric. 8. The method of claim 7 , wherein the reward information is calculated from a plurality of the at least one performance indicator. 9. The method of claim 1 , wherein the reinforcement learning agent updates the plurality of neural networks in accordance with at least one of: a Q reinforcement learning algorithm; a double deep Q reinforcement learning algorithm; a deterministic policy gradient algorithm; or an asynchronous advantage actor-critic algorithm. 10. The method of claim 1 , wherein each neural network of the plurality of neural networks comprises a double deep Q network, wherein the double deep Q network comprises a memory replay learning and n-step temporal difference learning process. 11. The method of claim 1 , wherein each neural network of the plurality of neural networks comprises: a recurrent neural network; or a long short-term memory neural network. 12. The method of claim 1 , wherein the plurality of settings is applied to the radio access network via a self-optimizing network controller. 13. The method of claim 1 , wherein at least a first setting for at least a first of the plurality of parameters is selected via a weighted average of at least a portion of the plurality of selections for the plurality of settings, wherein the at least the portion of the selections relates to the at least the first of the plurality of parameters. 14. The method of claim 1 , wherein at least a first setting for at least a first of the plurality of parameters is selected via a weighted majority arbitration among at least a portion of the plurality of selections for the plurality of settings, wherein the at least the portion of the selections relates to the at least the first of the plurality of parameters. 15. The method of claim 1 , wherein the reinforcement learning agent includes a plurality of critics, wherein each of the plurality of critics is for a corresponding sub-agent of the plurality of sub-agents, wherein each of the plurality of critics comprises a quality function that accounts for a policy of the corresponding sub-agent and at least one action of at least one other sub-agent of the plurality of sub-agents. 16. The method of claim 1 , wherein each of the plurality of sub-agents is assigned: a respective value function; and a respective plurality of permitted actions, where the plurality of permitted actions comprises a plurality of allowable settings for the plurality of parameters of the radio access network. 17. The method of claim 16 , wherein the state information and the reward information are published to at least one topic, wherein each of the plurality of sub-agents comprises a subscriber to the at least one topic. 18. The method of claim 17 , wherein the at least one topic comprises a plurality of topics, wherein at least two of the plurality of sub-agents are subscribed to different topics of the plurality of topics. 19. A non-transitory computer-readable medium storing instructions which, when executed by a processing system including at least one processor, cause the processing system to perform operations, the operations comprising: obtaining operational data from a radio access network; formatting the operational data into state information and reward information for a reinforcement learning agent; processing the state information and the reward information via the reinforcement learning agent, wherein the reinforcement learning agent comprises a plurality of sub-agents, wherein each of the plurality of sub-agents comprises a respective neural network of a plurality of neural networks, wherein each of the plurality of neural networks encodes a respective policy for selecting at least one setting of at least one parameter of the radio access network to increase a respective predicted reward in accordance with the state information, wherein each of the plurality of neural networks is updated in accordance with the reward information; determining a plurality of settings for a plurality of parameters of the radio access network via the reinforcement learning agent, wherein the reinforcement learning agent determines the plurality of settings in accordance with a plurality of selections for the plurality of settings via the plurality of sub-agents, wherein the plurality of settings includes the at least one setting a

Assignees

Inventors

Classifications

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Combinations of networks · CPC title

  • H04W24/02Primary

    Arrangements for optimising operational condition · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • Public Land Mobile systems, e.g. cellular systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11494649B2 cover?
A processing system including at least one processor may obtain operational data from a radio access network (RAN), format the operational data into state information and reward information for a reinforcement learning agent (RLA), processing the state information and the reward information via the RLA, where the RLA comprises a plurality of sub-agents, each comprising a respective neural netwo…
Who is the assignee on this patent?
At & T Ip I Lp
What technology area does this patent fall under?
Primary CPC classification H04W24/02. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Nov 08 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).