Energy saving in cellular wireless networks via transfer deep reinforcement learning

US2024406861A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2024406861-A1
Application numberUS-202418609797-A
CountryUS
Kind codeA1
Filing dateMar 19, 2024
Priority dateMay 31, 2023
Publication dateDec 5, 2024
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure provides methods, apparatuses, systems, and computer-readable mediums for operating a target base station by an apparatus. A method includes collecting a plurality of trajectories corresponding to the target base station and a plurality of source base stations, clustering, using an unsupervised reinforcement learning model, the plurality of trajectories into a plurality of clusters including a target cluster, selecting, as a target trajectory, a selected trajectory from the target cluster that maximizes an energy-saving parameter of the target base station, and applying, to the target base station, an energy-saving control policy corresponding to the target trajectory. The target cluster corresponds to the target base station and at least one source base station from among the plurality of source base stations.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for operating a target base station, by an apparatus, the method comprising: collecting a plurality of trajectories corresponding to the target base station and a plurality of source base stations; clustering, using an unsupervised reinforcement learning model, the plurality of trajectories into a plurality of clusters comprising a target cluster, the target cluster corresponding to the target base station and at least one source base station from among the plurality of source base stations; selecting, as a target trajectory, a selected trajectory from the target cluster that maximizes an energy-saving parameter of the target base station; and applying, to the target base station, an energy-saving control policy corresponding to the target trajectory. 2 . The method of claim 1 , further comprising: monitoring one or more energy-saving parameters of the target base station; and adjusting the energy-saving control policy applied to the target base station based on the one or more energy-saving parameters. 3 . The method of claim 2 , wherein the adjusting of the energy-saving control policy comprises: determining, based on the monitoring, that at least one of the one or more energy-saving parameters of the target base station is outside of a predetermined range of values; and adjusting the energy-saving control policy to cause the at least one of the one or more energy-saving parameters to be within the predetermined range of values. 4 . The method of claim 1 , further comprising: generating, using a base reinforcement learning model, a plurality of source control policies corresponding to the plurality of source base stations. 5 . The method of claim 4 , wherein the collecting of the plurality of trajectories comprises collecting a plurality of source base station trajectories corresponding to the plurality of source base stations, based on the plurality of source control policies, and wherein the applying of the energy-saving control policy comprises selecting the energy-saving control policy from among a control policy of the target base station and the plurality of source control policies. 6 . The method of claim 1 , further comprising: formulating the plurality of trajectories based on a Markov Decision Process (MDP), wherein each trajectory of the plurality of trajectories comprises a state space, an action space, a reward function, and a state transition probability function. 7 . The method of claim 6 , wherein the state space indicates at least one of a number of connected active devices per cell, a cell load ratio, and a throughput per cell, wherein the action space comprises at least one of activation thresholds and deactivation thresholds, wherein the reward function indicates a reward based on at least one of a power consumption and a minimum throughput, and wherein the state transition probability function indicates a probability of an action from the action space at a state of the state space. 8 . The method of claim 1 , wherein the selecting of the target trajectory comprises: performing iterative testing of respective control policies of each trajectory of the target cluster; determining, for each trajectory of the target cluster, an accumulated reward; and selecting, as the target trajectory, a trajectory of the target cluster that maximizes the accumulated reward. 9 . The method of claim 8 , wherein the performing of the iterative testing comprises performing testing of the respective control policies of each trajectory of the target cluster for a predetermined number of iterations. 10 . An apparatus for operating a target base station, the apparatus comprising: a memory storing instructions; and one or more processors communicatively coupled to the memory; wherein the one or more processors are configured to execute the instructions to: collect a plurality of trajectories corresponding to the target base station and a plurality of source base stations; cluster, using an unsupervised reinforcement learning model, the plurality of trajectories into a plurality of clusters comprising a target cluster, the target cluster corresponding to the target base station and at least one source base station from among the plurality of source base stations; select, as a target trajectory, a selected trajectory from the target cluster that maximizes an energy-saving parameter of the target base station; and apply, to the target base station, an energy-saving control policy corresponding to the target trajectory. 11 . The apparatus of claim 10 , wherein the one or more processors are further configured to execute further instructions to: monitor one or more energy-saving parameters of the target base station; and adjust the energy-saving control policy applied to the target base station based on the one or more energy-saving parameters. 12 . The apparatus of claim 11 , wherein the one or more processors are further configured to execute further instructions to: determine, based on the monitoring, that at least one of the one or more energy-saving parameters of the target base station is outside of a predetermined range of values; and adjust the energy-saving control policy to cause the at least one of the one or more energy-saving parameters to be within the predetermined range of values. 13 . The apparatus of claim 10 , wherein the one or more processors are further configured to execute further instructions to: generate, using a base reinforcement learning model, a plurality of source control policies corresponding to the plurality of source base stations. 14 . The apparatus of claim 13 , wherein the one or more processors are further configured to execute further instructions to: collect a plurality of source base station trajectories corresponding to the plurality of source base stations, based on the plurality of source control policies; and select the energy-saving control policy from among a control policy of the target base station and the plurality of source control policies. 15 . The apparatus of claim 10 , wherein the one or more processors are further configured to execute further instructions to: formulate the plurality of trajectories based on a Markov Decision Process (MDP), wherein each trajectory of the plurality of trajectories comprises a state space, an action space, a reward function, and a state transition probability function. 16 . The apparatus of claim 15 , wherein the state space indicates at least one of a number of connected active devices per cell, a cell load ratio, and a throughput per cell, wherein the action space comprises at least one of activation thresholds and deactivation thresholds, wherein the reward function indicates a reward based on at least one of a power consumption and a minimum throughput, and wherein the state transition probability function indicates a probability of an action from the action space at a state of the state space. 17 . The apparatus of claim 10 , wherein the one or more processors are further configured to execute further instructions to: perform iterative testing of respective control policies of each trajectory of the target cluster; determine, for each trajectory of the target cluster, an accumulated reward; and select, as the target trajectory, the selected trajectory from the target cluster that maximizes the accumulated reward. 18 . The apparatus of claim 17 , wherein the one or more processors are further configured to execute further instructions to: pe

Assignees

Inventors

Classifications

  • using machine learning or artificial intelligence · CPC title

  • in access points, e.g. base stations · CPC title

  • in wireless communication networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024406861A1 cover?
The present disclosure provides methods, apparatuses, systems, and computer-readable mediums for operating a target base station by an apparatus. A method includes collecting a plurality of trajectories corresponding to the target base station and a plurality of source base stations, clustering, using an unsupervised reinforcement learning model, the plurality of trajectories into a plurality o…
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification H04W52/0206. Mapped technology areas include Electricity.
When was this patent published?
Publication date Thu Dec 05 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).