Method for controlling air conditioning device based on delayed reward

US12188672B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12188672-B2
Application numberUS-202318492215-A
CountryUS
Kind codeB2
Filing dateOct 23, 2023
Priority dateNov 14, 2022
Publication dateJan 7, 2025
Grant dateJan 7, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed is a method for controlling an air conditioning device, which is performed by at least one computing device, which includes: determining a control action for the air conditioning device at a first time point by using a reinforcement learning agent; determining a reward for the control action at the first time point based on a reward delay time by using the reinforcement learning agent; and performing reinforcement learning related to the control of the air conditioning device based on the determined reward, in which a time point when the reward delay time elapses from the first time point corresponds to a second time point, and the reward for the control action at the first time point is calculated while excluding situations after the first time point and before the second time point.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for controlling an air conditioning device, performed by a computing device, the method comprising: determining a control action for the air conditioning device at a first time point by using a reinforcement learning agent, wherein the control action is used to control the air conditioning device, and wherein the air conditioning device includes a heating, ventilation, and air conditioning (HVAC) system or a thermal management system; determining a reward for the control action at the first time point based on a delay time by using the reinforcement learning agent, wherein the delay time corresponds to a time for a target temperature to be reached using the air conditioning device, and wherein determining the reward for the control action at the first time point is based on an assumption that the control action for the air conditioning device at the first time point is maintained to a third time point; and performing reinforcement learning related to the control of the air conditioning device based on the determined reward, wherein a time point when the delay time elapses from the first time point corresponds to a second time point, and wherein the reward for the control action at the first time point is calculated while excluding situations after the first time point and before the second time point, and generating a control signal to control the air conditioning device based on the control action. 2. The method of claim 1 , wherein the third time point is (i) equal to the second time point or (ii) a time point later than the second time point. 3. The method of claim 2 , wherein the determining of the reward for the control action at the first time point when the third time point is equal to the delay time includes: determining the reward for the control action at the first time point based on a situation at the second time point without considering the situations after the first time point and before the second time point. 4. The method of claim 2 , wherein the determining of the reward for the control action at the first time point when the third time point is longer than the delay time includes: determining the reward for the control action at the first time point based on situations at the second to third time points without considering the situations after the first time point and before the second time point. 5. The method of claim 4 , wherein the determining of the reward for the control action at the first time point based on the situations at the second to third time points includes: calculating rewards considering situations of time points included in the second time point to third time point, calculating a representative value of the calculated rewards, and determining the representative value as the reward for the control action at the first time point. 6. The method of claim 1 , wherein the control action includes at least one of: a control action for a compressor RPM value, a control action for a valve opening amount, a control action for a heating amount of a cooling water heater, a control action for a condenser, a control action for an evaporator, a control action for a radiator, a control action for an accumulator, a control action for a chiller, a control action for an outdoor heat exchanger, and a control action for an air purifying device, or a control action for a waste heat recovery device. 7. The method of claim 6 , wherein the reward is calculated based on internal situation information of the HVAC system or the thermal management system, and wherein the internal situation information includes at least one of: compressor information, condenser information, evaporator information, valve opening amount information, heater information, waste heat recovery information, temperature information, humidity information, air cleanliness information, or air flow information. 8. An apparatus comprising: at least one processor; and a memory, wherein the processor is configured to: determine a control action for the air conditioning device at a first time point by using a reinforcement learning agent, wherein the control action is used to control the air conditioning device, and wherein the air conditioning device includes a heating, ventilation, and air conditioning (HVAC) system or a thermal management system, determine a reward for the control action at the first time point based on a delay time by using the reinforcement learning agent, wherein the delay time corresponds to a time for a target temperature to be reached using the air conditioning device, and wherein determining the reward for the control action at the first time point is based on an assumption that the control action for the air conditioning device at the first time point is maintained to a third time point, and perform reinforcement learning related to the control of the air conditioning device based on the determined reward, wherein a time point when the delay time elapses from the first time point corresponds to a second time point, and wherein the reward for the control action at the first time point is calculated while excluding situations after the first time point and before the second time point, and generating a control signal to directly control the air conditioning device based on the control action. 9. A non-transitory computer readable storage medium storing a computer program, wherein the computer program performs operations for controlling an air conditioning device when executed by one or more processors included in a computing device, the operations comprising: an operation of determining a control action for the air conditioning device at a first time point by using a reinforcement learning agent, wherein the control action is used to control the air conditioning device, and wherein the air conditioning device includes a heating, ventilation, and air conditioning (HVAC) system or a thermal management system; an operation of determining a reward for the control action at the first time point based on a delay time by using the reinforcement learning agent, wherein the delay time corresponds to a time for a target temperature to be reached using the air conditioning device, and wherein determining the reward for the control action at the first time point is based on an assumption that the control action for the air conditioning device at the first time point is maintained to a third time point; and an operation of performing reinforcement learning related to the control of the air conditioning device based on the determined reward, wherein a time point when the delay time elapses from the first time point corresponds to a second time point, and wherein the reward for the control action at the first time point is calculated while excluding situations after the first time point and before the second time point, and generating a control signal to control the air conditioning device based on the control action.

Assignees

Inventors

Classifications

  • the criterion being a learning criterion · CPC title

  • Machine learning · CPC title

  • Control systems or circuits characterised by particular algorithms or computational models, e.g. fuzzy logic or dynamic models · CPC title

  • B60H1/3205Primary

    Control means therefor · CPC title

  • Temperature · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12188672B2 cover?
Disclosed is a method for controlling an air conditioning device, which is performed by at least one computing device, which includes: determining a control action for the air conditioning device at a first time point by using a reinforcement learning agent; determining a reward for the control action at the first time point based on a reward delay time by using the reinforcement learning agent…
Who is the assignee on this patent?
Makinarocks Co Ltd, Hanon Systems
What technology area does this patent fall under?
Primary CPC classification B60H1/3205. Mapped technology areas include Operations & Transport.
When was this patent published?
Publication date Tue Jan 07 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).