Methods and systems to adapt PID coefficients through reinforcement learning
US-12153385-B2 · Nov 26, 2024 · US
US12523969B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12523969-B2 |
| Application number | US-202218020835-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 10, 2022 |
| Priority date | Jun 17, 2021 |
| Publication date | Jan 13, 2026 |
| Grant date | Jan 13, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
According to an exemplary embodiment of the present disclosure, a dynamic model based optimal control method performed by a computing device including at least one processor is disclosed. The method includes acquiring state information including at least one state variable; calculating first control information by inputting the state information to a reinforcement learning control model; calculating second control information from the state information based on a feedback control algorithm; determining a sample similarity of the state information using a trained dynamic model; and calculating optimal control information based on the first control information, the second control information, and the sample similarity.
Opening claim text (preview).
The invention claimed is: 1 . A dynamic model based optimal control method performed by a computing device including at least one processor, the dynamic model based optimal control method comprising: acquiring state information including at least one state variable from one or more physical sensors of a vehicle thermal-management system, the state information comprising sensor signals representing temperature, subcool, compressor output, valve position, or heater output, wherein the state information is sampled at a previously determined fixed time interval during a process of driving the vehicle; calculating first control information by inputting the state information to a reinforcement learning control model; calculating second control information from the state information based on a feedback control algorithm comprising a proportional-integral-differential (PID) control loop acting on a vehicle-cabin temperature error; determining a sample similarity of the state information using a trained dynamic model; calculating optimal control information based on the first control information, the second control information, and the sample similarity; and executing the optimal control information by generating and transmitting drive signals to at least one actuator of the vehicle thermal-management system, wherein the actuator comprises a compressor drive, an electronic expansion valve, or an electric heater, thereby physically changing the state of refrigerant or air in the vehicle. 2 . The dynamic model based optimal control method according to claim 1 , wherein the state information is acquired according to a previously determined time interval during a process of driving a vehicle. 3 . The dynamic model based optimal control method according to claim 1 , wherein the state information includes at least one state variable regarding temperature data, subcool data, compressor output data, valve open/close data, or heater output data. 4 . The dynamic model based optimal control method according to claim 1 , wherein the dynamic model is a model for simulating an operation of an environment model, input data of the dynamic model includes state information and control information, and output data of the dynamic model incudes predicted state information. 5 . The dynamic model based optimal control method according to claim 4 , wherein the predicted state information included in the output data corresponds to a timing later than a timing corresponding to state information included in the input data. 6 . The dynamic model based optimal control method according to claim 1 , wherein the dynamic model is trained based on a simulation model which trains the reinforcement learning control model. 7 . The dynamic model based optimal control method according to claim 6 , wherein the dynamic model is trained based on a learning method including: acquiring current state information including at least one state variable from a simulation model; acquiring first control information from the reinforcement learning control model; inputting the current state information and the first control information to the dynamic model; acquiring predicted state information from the dynamic model; acquiring next state information from the simulation model; and comparing the predicted state information and the next state information. 8 . The dynamic model based optimal control method according to claim 1 , wherein the sample similarity is determined based on the predicted state information calculated based on the acquired state information and the dynamic model. 9 . The dynamic model based optimal control method according to claim 8 , wherein the sample similarity is determined by a smaller value in a previously determined similarity range as an error calculated between the state information and the predicted state information increases. 10 . The dynamic model based optimal control method according to claim 1 , wherein the calculating of optimal control information is performed based on a result of a weighted sum operation based on the first control information, the second control information, and the sample similarity. 11 . The dynamic model based optimal control method according to claim 1 , wherein the calculating of optimal control information includes: determining a weight corresponding to each of the first control information and the second control information based on the sample similarity; and calculating optimal control information based on a result of a weighted sum operation according to the determined weight. 12 . The dynamic model based optimal control method according to claim 11 , wherein in the determining as the sample similarity increases, a ratio obtained by dividing a first weight corresponding to the first control information by a second weight corresponding to the second control information is increased. 13 . The dynamic model based optimal control method according to claim 1 , wherein the dynamic model is consistently trained based on two or more state information acquired according to a time interval previously determined during the process of driving the vehicle. 14 . The dynamic model based optimal control method according to claim 13 , wherein the reinforcement learning control model is consistently trained by substituting a simulation model for learning into the consistently trained dynamic model. 15 . A computer program stored in a non-transitory computer readable storage medium, wherein when the computer program is executed by one or more processors, the computer program causes the one or more processors to perform the following operations to perform dynamic model based optimal control and the operations include: an operation of acquiring state information including at least one state variable from one or more sensors of a vehicle thermal-management system, wherein the state information is acquired at a predetermined time interval during a process of driving the vehicle; an operation of calculating first control information by inputting the state information to a reinforcement learning control model implemented as a deep neural network executed on a tensor processing unit (TPU) or graphics processing unit (GPU); an operation of calculating second control information from the state information based on a feedback control algorithm executed on a microcontroller within the same computing device; an operation of determining a sample similarity of the state information using a trained dynamic model implemented as a recurrent neural network (RNN) configured to output a one-step-ahead predicted state information for the system; an operation of calculating optimal control information based on the first control information, the second control information, and the sample similarity; and an operation of executing all steps within a real-time control loop having a cycle time not exceeding the predetermined time interval such that the optimal control information is transmitted to actuators of the vehicle thermal-management system in real time. 16 . An apparatus for performing optimal control based on a dynamic model, comprising: one or more processors; a memory; and a network unit, wherein the one or more processors are configured to: acquire state information including at least one state variable; calculate first control information by inputting the state information to a reinforcement learning control model; calculate second control information from the state information based on a feedback control algorithm; determine a sample similarity of the state inform
Machine learning · CPC title
in which a variable is automatically adjusted to optimise the performance · CPC title
involving the use of models or simulators · CPC title
electric · CPC title
using neural networks only · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.