Method and apparatus for control energy management system based on reinforcement learning

US11645728B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11645728-B2
Application numberUS-202117513757-A
CountryUS
Kind codeB2
Filing dateOct 28, 2021
Priority dateOct 29, 2020
Publication dateMay 9, 2023
Grant dateMay 9, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed is a method for controlling an energy management system that is performed by a computing device including at least one processor. The method may include acquiring a target temperature of one or more target points; and controlling one or more control variables using a reinforcement learning control model trained for a first condition regarding a state before a current temperature of the target points converges to the target temperature.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for controlling an energy management system (EMS) that is performed by a computing device including at least one processor, the method comprising: acquiring a target temperature of one or more target points; controlling one or more control variables using a reinforcement learning control model trained for a first condition regarding a state before a current temperature of the target points converges to the target temperature; controlling the one or more control variables using the reinforcement learning control model trained for a second condition regarding a state after the current temperature of the target points converges to the target temperature; and acquiring a target indirect indicator corresponding to the acquired target temperature, wherein the reinforcement learning control model is trained based on a reward that is calculated differently for the first condition and the second condition respectively, wherein the target indirect indicator includes a value obtained through at least one sensor from the environment, when the current temperature of the target points converges towards the target temperature, based on the reinforcement learning control model controlling one or more control variables, the reinforcement learning control model being trained to control one or more control variables based on a state information, wherein a training method of the reinforcement learning control model for the second condition regarding state after the current temperature of the target points converges to the target temperature includes: training a first control agent comprised in the reinforcement learning control model, based on a reward computed based on the current temperature of the target points, the target temperature, and total amount of work; and training a second control agent comprised in the reinforcement learning control model, based on a reward computed based on a current indirect indicator and the target indirect indicator. 2. The method for controlling EMS of claim 1 , wherein the reinforcement learning control model comprises: a first control agent trained for controlling a first control variable; and a second control agent trained for controlling a second control variable. 3. The method for controlling EMS of claim 2 , wherein the first control variable and the second control variable are dependent on each other, wherein the first control variable is an output of a compressor, and the second control variable is a degree of opening and closing of a valve, and wherein the reinforcement learning control model separately controls the output of the compressor and the degree of opening and closing of the valve. 4. The method for controlling EMS of claim 1 , wherein the reinforcement learning control model includes an artificial neural network layer including at least one node, and wherein a training method of the reinforcement leaning control model comprises: acquiring the state information from an environment including at least one sensor, by the reinforcement learning control model; controlling the one or more control variables based on the state information, by the reinforcement learning control model; acquiring the state information updated from the environment as a result of controlling the control variables, by the reinforcement learning control model; and training the reinforcement learning control model based on the acquired reward from the environment as the result of controlling the control variables. 5. The method for controlling the EMS of claim 4 , wherein the reward comprises at least one of the followings: a reward computed based on the current temperature of the target points and the target temperature; a reward computed based on total amount of work; or a reward computed based on a current indirect indicator and the target indirect indicator. 6. The method for controlling the EMS of claim 1 , wherein state information that the reinforcement learning control model acquires from the environment is first state information that includes at least one of state data on temperature, state data on an output of a compressor, and state data on a degree of opening and closing of a valve. 7. The method for controlling the EMS of claim 4 , wherein the training the reinforcement learning control model based on the acquired reward from the environment as the result of controlling the control variables, in the first condition, comprises: training a first control agent comprised in the reinforcement learning control model, based on a reward computed based on the current temperature of the target points and the target temperature; and training a second control agent comprised in the reinforcement learning control model, based on a reward computed based on total amount of work. 8. The method for controlling the EMS of claim 4 , wherein the training the reinforcement learning control model based on the acquired reward from the environment as the result of controlling the control variables, in the second condition, comprises: training a first control agent comprised in the reinforcement learning control model, based on the current temperature of the target points, the target temperature, and total amount of work; and training a second control agent comprised in the reinforcement learning control model, based on a reward computed based on the total amount of work. 9. The method for controlling the EMS of claim 1 , wherein the target indirect indicator includes a pre-determined value according to the target temperature. 10. The method for controlling the EMS of claim 1 , wherein state information that the reinforcement learning control model acquires from the environment is: second state information additionally including state data for an indirect indicator to first state information that includes at least one of state data on temperature, state data on an output of a compressor, and state data on a degree of opening and closing of a valve. 11. A non-transitory computer readable storage medium, wherein when the non-transitory computer readable storage medium is executed in one or more processors, the non-transitory computer readable storage medium causes the following operations to be performed for controlling an energy management system, the operations comprising: acquiring a target temperature of one or more target points; controlling one or more control variables using a reinforcement learning control model trained for a first condition regarding a state before a current temperature of the target points converges to the target temperature; controlling the one or more control variables using the reinforcement learning control model trained for a second condition regarding a state after the current temperature of the target points converges to the target temperature; and acquiring a target indirect indicator corresponding to the acquired target temperature, wherein the reinforcement learning control model is trained based on a reward that is calculated differently for the first condition and the second condition respectively, and wherein the target indirect indicator includes a value obtained through at least one sensor from the environment, when the current temperature of the target points converges towards the target temperature, based on the reinforcement learning control model controlling one or more control variables, the reinforcement learning control model being trained to control one or more control variables based on a state information, and wherein a training method of the reinforcement learning control model for the second condition regarding state after the current temperature of the target points converges

Assignees

Inventors

Classifications

  • G06Q50/06Primary

    Energy or water supply · CPC title

  • Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations (thermal management in cooling arrangements of a computing system G06F1/206) · CPC title

  • the criterion being a learning criterion · CPC title

  • HVAC, heating, ventillation, climate control · CPC title

  • electric · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11645728B2 cover?
Disclosed is a method for controlling an energy management system that is performed by a computing device including at least one processor. The method may include acquiring a target temperature of one or more target points; and controlling one or more control variables using a reinforcement learning control model trained for a first condition regarding a state before a current temperature of th…
Who is the assignee on this patent?
Makinarocks Co Ltd, Hanon Systems
What technology area does this patent fall under?
Primary CPC classification G06Q50/06. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 09 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).