Infrastructure driven auto-scaling of workloads
US-2024419470-A1 · Dec 19, 2024 · US
US9667498B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9667498-B2 |
| Application number | US-201414450148-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 1, 2014 |
| Priority date | Dec 20, 2013 |
| Publication date | May 30, 2017 |
| Grant date | May 30, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A self-adaptive control system based on proportional-integral (PI) control theory for dynamic capacity management of latency-sensitive application servers (e.g., application servers associated with a social networking application) are disclosed. A centralized controller of the system can adapt to changes in request rates, changes in application and/or system behaviors, underlying hardware upgrades, etc., by scaling the capacity of a cluster up or down so that just the right amount of capacity is maintained at any time. The centralized controller uses information relating to a current state of the cluster and historical information relating to past state of the cluster to predict a future state of the cluster and use that prediction to determine whether to scale up or scale down the current capacity to reduce latency and maximize energy savings. A load balancing system can then distribute traffic among the servers in the cluster using any load balancing methods.
Opening claim text (preview).
What is claimed is: 1. A system, comprising: a processor and memory; a controller deployed in a cluster, the controller being a part of a self-adaptive feedback control system, the controller configured to periodically: determine a current value of an operating parameter based on values of the operating parameter from servers in an active mode in the cluster, wherein an amount of servers in the active mode represents a current capacity, determine a total request rate for the cluster, determine a change in a per server request rate to cause a next value of the operating parameter to approach a target value of the operating parameter, wherein the change in the per server request rate is determined as a function of at least the current value of the operating parameter, a value of the per server request rate in a previous control cycle of the self-adaptive feedback system, and control parameters, wherein the per server request rate is different from the total request rate, and determine, based at least in part on the change in per server request rate and the total request rate, a required capacity, wherein the controller is further configured to adjust the current capacity to match the required capacity for optimization of power and latency; and a load balancer configured to distribute request traffic among servers that remain in the active mode in the cluster following adjustment of the current capacity to match the required capacity. 2. The system of claim 1 , wherein to adjust the current capacity to match the required capacity, the controller is further configured to: compare the current capacity to the required capacity to determine an excess capacity; and deallocate an amount of servers corresponding to the excess capacity from the active mode to an inactive mode to decrease the current capacity to match the required capacity. 3. The system of claim 1 , wherein to adjust the current capacity to match the required capacity, the controller is configured to: compare the current capacity to the required capacity to determine an insufficient capacity; and allocate an amount of servers corresponding to the insufficient capacity to the active mode from an inactive mode to increase the current capacity to match the required capacity. 4. The system of claim 1 , wherein the cluster includes an amount of servers in an inactive mode and the amount of servers in the active mode, wherein at least some of the amount of servers in the inactive mode are in an idle state for energy savings and rest of the amount of servers in the inactive mode are turned off or placed in a deep sleep mode for additional energy savings or used for processing non-latency sensitive jobs. 5. The system of claim 4 , wherein the controller is further configured to determine, based on a historical trend in changes in per server request rate, how many of the amount of servers in the inactive mode are to be maintained in the idle state so as to reduce set up time when the current capacity needs to be increased to match the required capacity. 6. The system of claim 1 , wherein the cluster includes an amount of servers in an inactive mode and the amount of servers in the active mode and wherein the controller maintains all servers in the inactive mode in idle state for energy savings. 7. The system of claim 1 , wherein the operating parameter includes CPU utilization. 8. The system of claim 1 , wherein the operating parameter includes latency. 9. The system of claim 1 , wherein the controller is based on a Proportional-Integral (PI) controller and the control parameters include proportional and integral gains. 10. A method performed on a computer system, comprising: determining a current value of an operating parameter based on information from a current number of active servers in a server pool; determining a deviation between the current value of the operating parameter and a target value of the operating parameter; determining a total request rate for the server pool; utilizing a feedback controller to determine change in per server request rate so as to enable a next value of the operating parameter to converge to a vicinity of the target value of the operating parameter, wherein the change in the per server request rate is determined based at least in part on the deviation, a value of the per server request rate in a previous control cycle of the feedback controller and control parameters, and wherein the per server request rate is different from the total request rate; determining, based at least in part on the change in per server request rate and the total request rate, a required number of active servers in the server pool; adjusting the current number of active servers in the server pool based on the required number of active servers to optimize energy savings and latency; and distributing, by a load balancer, incoming requests among the active servers in the server pool. 11. The method of claim 10 , wherein the feedback controller is a proportional-integral (PI) controller and the control parameters include a proportional gain and an integral gain. 12. The method of claim 10 , wherein the feedback controller is a proportional-integral-derivative (PID) controller and the control parameters include a proportional gain, an integral gain and a derivative gain. 13. The method of claim 10 , wherein the operating parameter includes any one of: CPU utilization or the latency. 14. The method of claim 10 , wherein adjusting the current number of active servers in the server pool based on the required number of active servers includes: determining that the current number of active servers in the server pool is greater than the required number of active servers; and in response, scaling down the current number of active servers in the server pool by transitioning a number of active servers in the server pool into inactive servers so that the adjusted number of active servers in the server pool matches the required number of active servers. 15. The method of claim 10 , wherein adjusting the current number of active servers in the server pool based on the required number of active servers includes: determining that the current number of active servers in the server pool is smaller than the required number of active servers; and in response, scaling up the current number of active servers in the server pool by transitioning a number of inactive servers in the server pool into active servers so that the adjusted number of active servers in the server pool matches the required number of active servers. 16. The method of claim 14 , wherein an inactive server accepts no request traffic and is placed in an idle state, a powered off state, a deep sleep state or used for processing asynchronous jobs. 17. The method of claim 14 , further comprising: maintaining a number of the inactive servers in the server pool in an idle state so that the inactive servers in the idle state can be transitioned into active servers without delay, wherein the number of the inactive servers to be maintained in the idle state is determined based on a historical trend of request rates. 18. The method of claim 14 , further comprising: establishing a model for a server type in the server pool by using empirical data and a linear fitting method to estimate correlation between the operating parameter and request rate.
Techniques for rebalancing the load in a distributed system · CPC title
Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters · CPC title
by checking functioning · CPC title
Bandwidth or capacity management, i.e. automatically increasing or decreasing capacities (flow or congestion control using dynamic resource allocation, e.g. in-call renegotiation, H04L47/76) · CPC title
Reserving resources in multiple paths to be used simultaneously (by balancing the load H04L47/125) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.