Systems and methods for controlling rights related to digital knowledge
US-2021342836-A1 · Nov 4, 2021 · US
US11494649B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11494649-B2 |
| Application number | US-202016778031-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 31, 2020 |
| Priority date | Jan 31, 2020 |
| Publication date | Nov 8, 2022 |
| Grant date | Nov 8, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processing system including at least one processor may obtain operational data from a radio access network (RAN), format the operational data into state information and reward information for a reinforcement learning agent (RLA), processing the state information and the reward information via the RLA, where the RLA comprises a plurality of sub-agents, each comprising a respective neural network, each of the neural networks encoding a respective policy for selecting at least one setting of at least one parameter of the RAN to increase a respective predicted reward in accordance with the state information, and where each neural network is updated in accordance with the reward information. The processing system may further determine settings for parameters of the RAN via the RLA, where the RLA determines the settings in accordance with selections for the settings via the plurality of sub-agents, and apply the plurality of settings to the RAN.
Opening claim text (preview).
What is claimed is: 1. A method comprising: obtaining, by a processing system including at least one processor, operational data from a radio access network; formatting, by the processing system, the operational data into state information and reward information for a reinforcement learning agent; processing, by the processing system, the state information and the reward information via the reinforcement learning agent, wherein the reinforcement learning agent comprises a plurality of sub-agents, wherein each of the plurality of sub-agents comprises a respective neural network of a plurality of neural networks, wherein each of the plurality of neural networks encodes a respective policy for selecting at least one setting of at least one parameter of the radio access network to increase a respective predicted reward in accordance with the state information, wherein each of the plurality of neural networks is updated in accordance with the reward information; determining, by the processing system, a plurality of settings for a plurality of parameters of the radio access network via the reinforcement learning agent, wherein the reinforcement learning agent determines the plurality of settings in accordance with a plurality of selections for the plurality of settings via the plurality of sub-agents, wherein the plurality of settings includes the at least one setting and the plurality of parameters includes the at least one parameter; and applying, by the processing system, the plurality of settings to the radio access network. 2. The method of claim 1 , wherein the processing the state information and the reward information via the reinforcement learning agent comprising: updating the plurality of neural networks in accordance with the reward information. 3. The method of claim 1 , wherein the plurality of settings for the plurality of parameters comprises at least one of: a tilt angle of at least one antenna array of the radio access network; or a power level of the at least one antenna array of the radio access network. 4. The method of claim 1 , wherein the plurality of settings for the plurality of parameters comprises a media access control scheduling algorithm from among a plurality of available media access control scheduling algorithms. 5. The method of claim 1 , wherein the plurality of settings for the plurality of parameters comprises a handover offset setting from among a plurality of available handover offset settings. 6. The method of claim 1 , wherein the state information comprises a plurality of performance indicators that include at least two of: a throughput; an uplink volume; a downlink volume; a physical resource block utilization; a number of active endpoint devices; a handover frequency; an average endpoint device bandwidth; a geographic distribution of endpoint devices; a radio frequency distribution; or a traffic volume. 7. The method of claim 1 , wherein the reward information comprises at least one performance indicator, including at least one of: an endpoint device throughput; a harmonic user equipment throughput; a throughput differential among endpoint devices; a drop rate; a retainabilty metric; or an accessibility metric. 8. The method of claim 7 , wherein the reward information is calculated from a plurality of the at least one performance indicator. 9. The method of claim 1 , wherein the reinforcement learning agent updates the plurality of neural networks in accordance with at least one of: a Q reinforcement learning algorithm; a double deep Q reinforcement learning algorithm; a deterministic policy gradient algorithm; or an asynchronous advantage actor-critic algorithm. 10. The method of claim 1 , wherein each neural network of the plurality of neural networks comprises a double deep Q network, wherein the double deep Q network comprises a memory replay learning and n-step temporal difference learning process. 11. The method of claim 1 , wherein each neural network of the plurality of neural networks comprises: a recurrent neural network; or a long short-term memory neural network. 12. The method of claim 1 , wherein the plurality of settings is applied to the radio access network via a self-optimizing network controller. 13. The method of claim 1 , wherein at least a first setting for at least a first of the plurality of parameters is selected via a weighted average of at least a portion of the plurality of selections for the plurality of settings, wherein the at least the portion of the selections relates to the at least the first of the plurality of parameters. 14. The method of claim 1 , wherein at least a first setting for at least a first of the plurality of parameters is selected via a weighted majority arbitration among at least a portion of the plurality of selections for the plurality of settings, wherein the at least the portion of the selections relates to the at least the first of the plurality of parameters. 15. The method of claim 1 , wherein the reinforcement learning agent includes a plurality of critics, wherein each of the plurality of critics is for a corresponding sub-agent of the plurality of sub-agents, wherein each of the plurality of critics comprises a quality function that accounts for a policy of the corresponding sub-agent and at least one action of at least one other sub-agent of the plurality of sub-agents. 16. The method of claim 1 , wherein each of the plurality of sub-agents is assigned: a respective value function; and a respective plurality of permitted actions, where the plurality of permitted actions comprises a plurality of allowable settings for the plurality of parameters of the radio access network. 17. The method of claim 16 , wherein the state information and the reward information are published to at least one topic, wherein each of the plurality of sub-agents comprises a subscriber to the at least one topic. 18. The method of claim 17 , wherein the at least one topic comprises a plurality of topics, wherein at least two of the plurality of sub-agents are subscribed to different topics of the plurality of topics. 19. A non-transitory computer-readable medium storing instructions which, when executed by a processing system including at least one processor, cause the processing system to perform operations, the operations comprising: obtaining operational data from a radio access network; formatting the operational data into state information and reward information for a reinforcement learning agent; processing the state information and the reward information via the reinforcement learning agent, wherein the reinforcement learning agent comprises a plurality of sub-agents, wherein each of the plurality of sub-agents comprises a respective neural network of a plurality of neural networks, wherein each of the plurality of neural networks encodes a respective policy for selecting at least one setting of at least one parameter of the radio access network to increase a respective predicted reward in accordance with the state information, wherein each of the plurality of neural networks is updated in accordance with the reward information; determining a plurality of settings for a plurality of parameters of the radio access network via the reinforcement learning agent, wherein the reinforcement learning agent determines the plurality of settings in accordance with a plurality of selections for the plurality of settings via the plurality of sub-agents, wherein the plurality of settings includes the at least one setting a
Related publications grouped by family.
Answers are generated from the same data shown on this page.