Methods and systems to adapt PID coefficients through reinforcement learning

US12153385B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12153385-B2
Application numberUS-202117314351-A
CountryUS
Kind codeB2
Filing dateMay 7, 2021
Priority dateMay 7, 2021
Publication dateNov 26, 2024
Grant dateNov 26, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are used to adapt the coefficients of a proportional-integral-derivative (PID) controller through reinforcement learning. The approach for adapting PID coefficients can include an outer loop of reinforcement learning where the PID coefficients are tuned to changes in the environment and an inner loop of PID control for quickly reacting to changing errors. The outer loop can learn and adapt as the environment changes and be configured to only run at a predetermined frequency, after a given number of steps. The outer loop can use summary statistics about the error terms and any other information sensed about the environment to calculate an observation. This observation can be used to evaluate the next action, for example, by feeding it into a neural network representing the policy. The resulting action is the coefficients of the PID controller and the tunable parameters of things such as the filters.

First claim

Opening claim text (preview).

What is claimed is: 1. A reinforcement learning process for automatically tuning proportional-integral-derivative (PID) coefficients, the process performing the steps of: operating a PID controller at a first frequency to minimize an error between a variable setpoint and a process output; training a policy by a reward function derived from sensed information after the policy acts on the PID controller by updating the PID coefficients; and operating a reinforcement learning model at a second frequency, the reinforcement learning model performing the following steps: receiving summary statistics about error terms from the PID controller and sensed information on the environment to calculate an observation; selecting an action based on the observation by feeding the observation into the trained policy; predicting a result of taking the action, the action including changing the PID coefficients; and updating the policy by the reward function derived from sensed information after the policy acts on the PID controller by updating the PID coefficients. 2. The process of claim 1 , wherein the reward function is derived from the summary statistics about the error terms from the PID controller. 3. The process of claim 1 , wherein the action includes changing tunable parameters of filters. 4. The process of claim 1 , wherein the first frequency is greater than the second frequency. 5. The process of claim 1 , wherein the first frequency is 100 to 10,000 times greater than the second frequency. 6. The process of claim 1 , wherein the PID controller operates continuously in real time. 7. The process of claim 1 , further comprising deploying the trained policy into a production environment. 8. The process of claim 1 , wherein the reward function is based on one or more of minimizing the error, minimizing control variable changes, and minimizing an overshoot. 9. A method of automatically adjusting coefficients of a proportional-integral-derivative (PID) controller, the method comprising: training a policy by a reward function derived from sensed information after the policy acts on the PID controller by updating the PID coefficients; and operating a reinforcement learning model to perform the following steps: receiving summary statistics about error terms from the PID controller and receiving sensed information on the environment to calculate an observation; selecting an action based on the observation by feeding the observation into the trained policy; predicting a result of taking the action, the action including changing the PID coefficients; and updating the policy by the reward function derived from sensed information after the policy acts on the PID controller by updating the PID coefficients. 10. The method of claim 9 , further comprising operating the PID controller at a first frequency to minimize an error between a variable setpoint and a process output, the reinforcement learning model being operated at a second frequency, the second frequency being less than the first frequency. 11. The method of claim 9 , wherein the reward function is derived from the summary statistics about the error terms from the PID controller. 12. The method of claim 9 , wherein the action includes changing tunable parameters of filters. 13. The method of claim 10 , wherein the first frequency is 100 to 10,000 times greater than the second frequency. 14. The method of claim 10 , wherein the PID controller operates continuously in real time. 15. A method of automatically adjusting coefficients of a proportional-integral-derivative (PID) controller, the method comprising: operating the PID controller at each time step to minimize an error between a variable setpoint and a process output; training a policy by a reward function derived from sensed information after the policy acts on the PID controller by updating the PID coefficients; and operating a reinforcement learning model after a plurality of the time steps to perform the following steps: receiving summary statistics about error terms from the PID controller and receiving sensed information on the environment to calculate an observation; and selecting an action based on the observation by feeding the observation into the trained policy, the trained policy predicting results of the action. 16. The method of claim 15 , wherein the reward function compares the prediction of the policy with the summary statistics about the error terms over the plurality of time steps. 17. The method of claim 15 , wherein the action includes changing tunable parameters of filters. 18. The method of claim 15 , wherein the plurality of time steps is 100 to 10,000 times greater than the second frequency. 19. The method of claim 15 , wherein the PID controller operates continuously in real time. 20. A reinforcement learning process for automatically tuning proportional-integral-derivative (PID) coefficients, the process performing the steps of: operating a PID controller at a first frequency to minimize an error between a variable setpoint and a process output; training a policy by a reward function derived from sensed information after the policy acts on the PID controller by updating the PID coefficients; and operating a reinforcement learning model at a second frequency, the reinforcement learning model performing the following steps: receiving summary statistics about error terms from the PID controller and sensed information on the environment to calculate an observation; and selecting an action based on the observation by feeding the observation into the trained policy, the action including changing the PID coefficients, wherein the first frequency is greater than the second frequency. 21. A method of automatically adjusting coefficients of a proportional-integral-derivative (PID) controller, the method comprising: training a policy by a reward function derived from sensed information after the policy acts on the PID controller by updating the PID coefficients; operating a reinforcement learning model to perform the following steps: receiving summary statistics about error terms from the PID controller and receiving sensed information on the environment to calculate an observation; and selecting an action based on the observation by feeding the observation into the trained policy, the action including changing the PID coefficients; and operating the PID controller at a first frequency to minimize an error between a variable setpoint and a process output, the reinforcement learning model being operated at a second frequency, the second frequency being less than the first frequency.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12153385B2 cover?
Systems and methods are used to adapt the coefficients of a proportional-integral-derivative (PID) controller through reinforcement learning. The approach for adapting PID coefficients can include an outer loop of reinforcement learning where the PID coefficients are tuned to changes in the environment and an inner loop of PID control for quickly reacting to changing errors. The outer loop can …
Who is the assignee on this patent?
Sony Group Corp, Sony Corp America
What technology area does this patent fall under?
Primary CPC classification G05B13/0265. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 26 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).