Apparatus and method for controlling system
US-11579569-B2 · Feb 14, 2023 · US
US12153385B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12153385-B2 |
| Application number | US-202117314351-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 7, 2021 |
| Priority date | May 7, 2021 |
| Publication date | Nov 26, 2024 |
| Grant date | Nov 26, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods are used to adapt the coefficients of a proportional-integral-derivative (PID) controller through reinforcement learning. The approach for adapting PID coefficients can include an outer loop of reinforcement learning where the PID coefficients are tuned to changes in the environment and an inner loop of PID control for quickly reacting to changing errors. The outer loop can learn and adapt as the environment changes and be configured to only run at a predetermined frequency, after a given number of steps. The outer loop can use summary statistics about the error terms and any other information sensed about the environment to calculate an observation. This observation can be used to evaluate the next action, for example, by feeding it into a neural network representing the policy. The resulting action is the coefficients of the PID controller and the tunable parameters of things such as the filters.
Opening claim text (preview).
What is claimed is: 1. A reinforcement learning process for automatically tuning proportional-integral-derivative (PID) coefficients, the process performing the steps of: operating a PID controller at a first frequency to minimize an error between a variable setpoint and a process output; training a policy by a reward function derived from sensed information after the policy acts on the PID controller by updating the PID coefficients; and operating a reinforcement learning model at a second frequency, the reinforcement learning model performing the following steps: receiving summary statistics about error terms from the PID controller and sensed information on the environment to calculate an observation; selecting an action based on the observation by feeding the observation into the trained policy; predicting a result of taking the action, the action including changing the PID coefficients; and updating the policy by the reward function derived from sensed information after the policy acts on the PID controller by updating the PID coefficients. 2. The process of claim 1 , wherein the reward function is derived from the summary statistics about the error terms from the PID controller. 3. The process of claim 1 , wherein the action includes changing tunable parameters of filters. 4. The process of claim 1 , wherein the first frequency is greater than the second frequency. 5. The process of claim 1 , wherein the first frequency is 100 to 10,000 times greater than the second frequency. 6. The process of claim 1 , wherein the PID controller operates continuously in real time. 7. The process of claim 1 , further comprising deploying the trained policy into a production environment. 8. The process of claim 1 , wherein the reward function is based on one or more of minimizing the error, minimizing control variable changes, and minimizing an overshoot. 9. A method of automatically adjusting coefficients of a proportional-integral-derivative (PID) controller, the method comprising: training a policy by a reward function derived from sensed information after the policy acts on the PID controller by updating the PID coefficients; and operating a reinforcement learning model to perform the following steps: receiving summary statistics about error terms from the PID controller and receiving sensed information on the environment to calculate an observation; selecting an action based on the observation by feeding the observation into the trained policy; predicting a result of taking the action, the action including changing the PID coefficients; and updating the policy by the reward function derived from sensed information after the policy acts on the PID controller by updating the PID coefficients. 10. The method of claim 9 , further comprising operating the PID controller at a first frequency to minimize an error between a variable setpoint and a process output, the reinforcement learning model being operated at a second frequency, the second frequency being less than the first frequency. 11. The method of claim 9 , wherein the reward function is derived from the summary statistics about the error terms from the PID controller. 12. The method of claim 9 , wherein the action includes changing tunable parameters of filters. 13. The method of claim 10 , wherein the first frequency is 100 to 10,000 times greater than the second frequency. 14. The method of claim 10 , wherein the PID controller operates continuously in real time. 15. A method of automatically adjusting coefficients of a proportional-integral-derivative (PID) controller, the method comprising: operating the PID controller at each time step to minimize an error between a variable setpoint and a process output; training a policy by a reward function derived from sensed information after the policy acts on the PID controller by updating the PID coefficients; and operating a reinforcement learning model after a plurality of the time steps to perform the following steps: receiving summary statistics about error terms from the PID controller and receiving sensed information on the environment to calculate an observation; and selecting an action based on the observation by feeding the observation into the trained policy, the trained policy predicting results of the action. 16. The method of claim 15 , wherein the reward function compares the prediction of the policy with the summary statistics about the error terms over the plurality of time steps. 17. The method of claim 15 , wherein the action includes changing tunable parameters of filters. 18. The method of claim 15 , wherein the plurality of time steps is 100 to 10,000 times greater than the second frequency. 19. The method of claim 15 , wherein the PID controller operates continuously in real time. 20. A reinforcement learning process for automatically tuning proportional-integral-derivative (PID) coefficients, the process performing the steps of: operating a PID controller at a first frequency to minimize an error between a variable setpoint and a process output; training a policy by a reward function derived from sensed information after the policy acts on the PID controller by updating the PID coefficients; and operating a reinforcement learning model at a second frequency, the reinforcement learning model performing the following steps: receiving summary statistics about error terms from the PID controller and sensed information on the environment to calculate an observation; and selecting an action based on the observation by feeding the observation into the trained policy, the action including changing the PID coefficients, wherein the first frequency is greater than the second frequency. 21. A method of automatically adjusting coefficients of a proportional-integral-derivative (PID) controller, the method comprising: training a policy by a reward function derived from sensed information after the policy acts on the PID controller by updating the PID coefficients; operating a reinforcement learning model to perform the following steps: receiving summary statistics about error terms from the PID controller and receiving sensed information on the environment to calculate an observation; and selecting an action based on the observation by feeding the observation into the trained policy, the action including changing the PID coefficients; and operating the PID controller at a first frequency to minimize an error between a variable setpoint and a process output, the reinforcement learning model being operated at a second frequency, the second frequency being less than the first frequency.
Machine learning · CPC title
electric · CPC title
Reinforcement learning · CPC title
Learning methods · CPC title
the criterion being a learning criterion · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.