Method of performing communication load balancing with multi-teacher reinforcement learning, and an apparatus for the same

US12238190B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12238190-B2
Application numberUS-202318351201-A
CountryUS
Kind codeB2
Filing dateJul 12, 2023
Priority dateOct 6, 2021
Publication dateFeb 25, 2025
Grant dateFeb 25, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A server may be provided to obtain a load balancing artificial intelligence (AI) model for a plurality of base stations in a communication system. The server may obtain teacher models based on traffic data sets collected from the base stations, respectively; perform a policy rehearsal process including obtaining student models based on knowledge distillation from the teacher models, obtaining an ensemble student model by ensembling the student models, and obtaining a policy model by interacting with the ensemble student mode; provide the policy model to each of the base stations for a policy evaluation of the policy model; and based on a training continue signal being received from at least one of the base stations as a result of the policy evaluation, update the ensemble student model and the policy model by performing the policy rehearsal process on the student models.

First claim

Opening claim text (preview).

What is claimed is: 1. A server for obtaining a load balancing artificial intelligence (AI) model for a plurality of base stations in a communication system, the server comprising: at least one memory storing instructions; and at least one processor configured to execute the instructions to: obtain a plurality of teacher models based on a plurality of traffic data sets collected from the plurality of base stations, respectively; obtain a plurality of student models based on knowledge distillation from the plurality of teacher models, wherein the plurality of student models is trained based on a knowledge distillation loss that represents a difference between a teacher prediction of the plurality of teacher models and a student prediction of the plurality of student models, and obtain the load balancing AI model for the plurality of base stations based on the plurality of student models. 2. The server of claim 1 , wherein the at least one processor further configured to execute the instructions to obtain the load balancing AI model for the plurality of base stations, by: obtaining an ensemble student model by ensembling the plurality of student models; and transmitting the ensemble student model to the plurality of base stations, respectively. 3. The server of claim 2 , the at least one processor further configured to execute the instructions to: receive feedback information of the ensemble student model from the plurality of base stations, and update the ensemble student model based on the received feedback information. 4. The server of claim 1 , wherein the least one processor is further configured to execute the instructions to: obtain the plurality of teacher models by receiving model parameters of the plurality of teacher models from the plurality of base stations, and updating initialized model parameters of the plurality of teacher models based on the received model parameters. 5. The server of claim 1 , wherein the plurality of traffic data sets comprise state-action-reward trajectories that comprise states, actions, and rewards, the states comprise at least one of an active user equipment (UE) number, a bandwidth utilization, an internet protocol (IP) throughput, a cell physical resource usage, and a speed of a download link, the actions comprise a load balancing parameter that causes the states to be changed, and the rewards comprise at least one of a minimum of IP throughput, a total IP throughput, and a dead cell count. 6. The server of claim 1 , wherein each of the plurality of teacher models comprises a state transition model and a reward transition model that are trained based on state-action-reward trajectories that are collected from the plurality of base stations, wherein the state transition model is configured to output a predicted next state based on an action taken in a current state, and wherein the reward transition model is configured to output a predicted reward based on the action taken in the current state. 7. The server of claim 1 , wherein the at least one processor further configured to execute the instructions to obtain the plurality of student models based on knowledge distillation from the plurality of teacher models, by: computing a ground-truth loss based on a difference between a ground-truth value and a prediction of each of the plurality of student models; computing the knowledge distillation loss based on the difference between the teacher prediction of the plurality of teacher models and the student prediction of the plurality of student models; computing an aggregated loss that combines the ground-truth loss and the knowledge distillation loss; and training the plurality of student models by minimizing or converging the aggregated loss. 8. The server of claim 1 , wherein the least one processor is further configured to execute the instructions to obtain a policy model by: obtaining state-reward pairs from the plurality of student models; computing an average of the state-reward pairs; inputting the average of the state-reward pairs to the policy model to obtain an action as an output of the policy model; increasing a time step by one; based on the increased time step being less than a predetermined value, inputting the action to the plurality of student models; and based on the increased time step being equal to the predetermined value, outputting the policy model. 9. The server of claim 2 , wherein the least one processor is further configured to execute the instructions to: obtain a policy model by interacting with the ensemble student model; provide the policy model to each of the plurality of base stations for a policy evaluation of the policy model; and based on a training continue signal being received from at least one of the plurality of base stations as a result of the policy evaluation, update the ensemble student model and the policy model, wherein the training continue signal is provided as feedback information of the ensemble student model and indicates that a reward obtained from the ensemble student model is less than a reward obtained from an existing load balancing model by a predetermined margin or more. 10. A method for obtaining a load balancing artificial intelligence (AI) model for a plurality of base stations in a communication system, the method comprising: obtaining a plurality of teacher models based on a plurality of traffic data sets collected from the plurality of base stations, respectively; obtaining a plurality of student models based on knowledge distillation from the plurality of teacher models, wherein the plurality of student models is trained based on a knowledge distillation loss that represents a difference between a teacher prediction of the plurality of teacher models and a student prediction of the plurality of student models, and obtaining the load balancing AI model for the plurality of base stations based on the plurality of student models. 11. The method of claim 10 , wherein the obtaining the load balancing AI model for the plurality of base stations comprises: obtaining an ensemble student model by ensembling the plurality of student models; and transmitting the ensemble student model to the plurality of base stations, respectively. 12. The method of claim 11 , further comprising: receiving feedback information of the ensemble student model from the plurality of base stations, and updating the ensemble student model based on the received feedback information. 13. The method of claim 10 , further comprising: obtaining the plurality of teacher models by receiving model parameters of the plurality of teacher models from the plurality of base stations, and updating initialized model parameters of the plurality of teacher models based on the received model parameters. 14. The method of claim 10 , wherein the plurality of traffic data sets comprise state-action-reward trajectories that comprise states, actions, and rewards, the states comprise at least one of an active user equipment (UE) number, a bandwidth utilization, an internet protocol (IP) throughput, a cell physical resource usage, and a speed of a download link, the actions comprise a load balancing parameter that causes the states to be changed, and the rewards comprise at least one of a minimum of IP throughput, a total IP throughput, and a dead cell count. 15. The method of claim 10 , wherein each of the plurality of teacher models comprises a state transition model and a reward transition model that are trained based on state-action-reward trajectories that are collected from the plurality of base stations, w

Assignees

Inventors

Classifications

  • H04L41/16Primary

    using machine learning or artificial intelligence · CPC title

  • Server selection for load balancing · CPC title

  • Arrangements for optimising operational condition · CPC title

  • based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title

  • Reinforcement learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12238190B2 cover?
A server may be provided to obtain a load balancing artificial intelligence (AI) model for a plurality of base stations in a communication system. The server may obtain teacher models based on traffic data sets collected from the base stations, respectively; perform a policy rehearsal process including obtaining student models based on knowledge distillation from the teacher models, obtaining a…
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification H04L41/16. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Feb 25 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).