Automated reinforcement-learning-based application manager that uses local agents
US-10970649-B2 · Apr 6, 2021 · US
US11275429B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11275429-B2 |
| Application number | US-202016915133-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 29, 2020 |
| Priority date | Jun 29, 2020 |
| Publication date | Mar 15, 2022 |
| Grant date | Mar 15, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus comprises a processing device configured to obtain first parameters characterizing an operating state of information technology (IT) resources of a data center and second parameters characterizing an operating state of cooling systems of the data center, to determine an overall operating state of the data center by aggregating the first and second parameters, to identify a power consumption profile based on the overall operating state, and to perform a joint training of first and second reinforcement learning agents based on the overall operating state and the power consumption profile. The processing device is also configured to generate first controls for the heterogeneous IT resources utilizing the first reinforcement learning agent and second controls for the cooling systems utilizing the second reinforcement learning agent, the first and second controls being configured to reduce power consumption while maintaining specified performance benchmarks for workloads executing in the data center.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: at least one processing device comprising a processor coupled to a memory; the at least one processing device being configured to perform steps of: obtaining a first set of parameters characterizing an operating state of a plurality of heterogeneous information technology resources of a data center and a second set of parameters characterizing an operating state of one or more cooling systems of the data center; determining an overall operating state of the data center by aggregating the first and second sets of parameters; identifying a power consumption profile of the data center based at least in part on the determined overall operating state of the data center; performing a joint training of a first set of one or more reinforcement learning agents and a second set of one or more reinforcement learning agents based at least in part on the determined overall operating state of the data center and the identified power consumption profile; generating a first set of controls for the plurality of heterogeneous information technology resources of the data center utilizing the trained first set of one or more reinforcement learning agents and a second set of controls for the one or more cooling systems of the data center utilizing the trained second set of one or more reinforcement learning agents, the first and second sets of controls being configured to reduce power consumption by the data center while maintaining specified performance benchmarks for workloads executing in the data center; and controlling operation of the data center based at least in part on the first and second sets of controls. 2. The apparatus of claim 1 wherein the first set of parameters comprises telemetry information obtained from the plurality of heterogeneous information technology resources of the data center, the telemetry information comprising: temperature measurements for one or more hardware components of each of the plurality of heterogeneous information technology resources for a given period of time; and power consumption measurements for each of the plurality of heterogeneous information technology resources for the given period of time. 3. The apparatus of claim 1 wherein the first set of parameters comprises resource management information obtained from the plurality of heterogeneous information technology resources of the data center, the resource management information comprising two or more of: average central processing unit (CPU) speed measurements for each of the plurality of heterogeneous information technology resources for a given period of time; CPU load measurements for each of the plurality of heterogeneous information technology resources for the given period of time; average uptime measurements for each of the plurality of heterogeneous information technology resources for the given period of time; and average memory measurements for each of the plurality of heterogeneous information technology resources for the given period of time. 4. The apparatus of claim 1 wherein the first set of parameters comprises task management information for a plurality of workloads scheduled for execution on the plurality of heterogeneous information technology resources of the data center, the task management information comprising two or more of: expected central processing unit (CPU) requirements for at least a subset of the plurality of workloads scheduled for execution on the plurality of heterogeneous information technology resources for a given upcoming period of time; expected memory requirements for at least a subset of the plurality of workloads scheduled for execution on the plurality of heterogeneous information technology resources for the given upcoming period of time; expected time for completion for at least a subset of the plurality of workloads scheduled for execution on the plurality of heterogeneous information technology resources; a most recent wait time for the plurality of workloads scheduled for execution on the plurality of heterogeneous information technology resources; and a most recent execution time for the plurality of workloads scheduled for execution on the plurality of heterogeneous information technology resources. 5. The apparatus of claim 1 wherein the second set of parameters comprise telemetry information obtained from the one or more cooling systems, the telemetry information comprising two or more of: air flow measurements for each of a plurality of air conditioning units of the one or more cooling systems for a given period of time; input temperature measurements for each of the plurality of air conditioning units of the one or more cooling systems for the given period of time; output temperature measurements for each of the plurality of air conditioning units of the one or more cooling systems for the given period of time; and power consumption measurements for each of the plurality of air conditioning units of the one or more cooling systems for the given period of time. 6. The apparatus of claim 1 wherein obtaining the first and second sets of parameters, determining the overall operating state of the data center, identifying the power consumption profile, generating the first and second sets of controls, and controlling operation of the data center are performed for each of two or more time periods, each of the two or more time periods being associated with a change in the operating state of the plurality of heterogeneous information technology resources of the data center. 7. The apparatus of claim 6 wherein the change in the operating state of the plurality of heterogeneous information technology resources of the data center comprises at least one of: arrival of one or more new workloads in a queue of workloads to be scheduled on the plurality of heterogeneous information technology resources of the data center; and completion of one or more workloads currently operating on one or more of the plurality of heterogeneous information technology resources of the data center. 8. The apparatus of claim 1 wherein identifying the power consumption profile comprises identifying a joint reward characterizing power consumption by the data center as a weighted summation of reward components identified from the first and second sets of parameters. 9. The apparatus of claim 8 wherein the weighted summation comprises reward components for: at least one of central processing unit (CPU) speed measurements, CPU load measurements, uptime measurements and memory measurements for the plurality of heterogeneous information technology resources in the first set of parameters; at least one of a most recent wait time and a most recent execution time for workloads scheduled for execution on the plurality of heterogeneous information technology resources in the first set of parameters; power consumption measurements for the plurality of heterogeneous information technology resources in the first set of parameters; and power consumption measurements for each of a plurality of air conditioning units of the one or more cooling systems in the second set of parameters. 10. The apparatus of claim 1 wherein the first set of controls comprises identification of workloads to be assigned to respective ones of the plurality of heterogeneous information technology resources for execution in an upcoming period of time. 11. The apparatus of claim 1 wherein the second set of controls comprises temperature setpoint information for each of a plurality of air conditioning units of the one or more cooling system for an upcoming period of time. 12. The apparatus of claim 1 wherein jointly training the fi
Energy efficient computing, e.g. low power processors, power management or thermal management · CPC title
by task scheduling · CPC title
comprising thermal management · CPC title
Supervision thereof, e.g. detecting power-supply failure by out of limits supervision · CPC title
the criterion being a learning criterion · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.