Method and system for directly tuning PID parameters using a simplified actor-critic approach to reinforcement learning

US11500337B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11500337-B2
Application numberUS-201916673901-A
CountryUS
Kind codeB2
Filing dateNov 4, 2019
Priority dateNov 4, 2019
Publication dateNov 15, 2022
Grant dateNov 15, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and system for reinforcement learning can include an actor-critic framework comprising an actor and a critic, the actor comprising an actor network and the critic comprising a critic network; and a controller comprising a neural network embedded in the actor-critic framework and which can be tuned according to reinforcement learning based tuning including anti-windup tuning.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for reinforcement learning, the system comprising: an actor-critic framework comprising an actor and a critic, the actor comprising an actor network including a proportional integral derivative (PID) controller having PID gains, wherein the PID gains are weights of the actor network, the weights of the actor network initialize the PID gains on each parameter of the PID controller, and the critic comprises a critic network including at least one function associated with the actor, wherein the PID controller comprises parameters including an anti-windup parameter associated with tuning of the PID controller, wherein the anti-windup parameter functions in discrete-time by feeding into a control signal using a scaled sum of past deviations of an actuator signal from an unsaturated signal with a nonnegative scaling constant, to redefine the PID controller; and a controller including the PID controller comprising a neural network embedded in the actor-critic framework and which is tuned according to reinforcement learning based tuning including anti-windup tuning. 2. The system of claim 1 wherein the controller allows for constraining of individual parameters. 3. The system of claim 1 wherein the actor network is initialized with gains, which are already in use or known to be stabilizing. 4. The system of claim 1 wherein the weights associated with the actor are initialized with selected PID gains. 5. The system of claim 1 wherein the PID controller comprises a (Proportional-Derivative) portion. 6. The system of claim 1 wherein the PID controller comprises an integral portion. 7. The system of claim 1 wherein the PID controller comprises a PD (Proportional-Derivative) portion and an integral portion. 8. A system for reinforcement learning, the system comprising: at least one processor; and a non-transitory computer-usable medium embodying computer program code, said non-transitory computer-usable medium capable of communicating with said at least one processor, said computer program code comprising instructions executable by said at least one processor and configured for: providing an actor-critic framework comprising an actor and a critic, the actor comprising an actor network including a proportional integral derivative (PID) controller having PID gains, wherein the PID gains are weights of the actor network, the weights of the actor network initialize the PID gains on each parameter of the PID controller, and the critic comprises a critic network including at least one function associated with the actor, wherein the PID controller comprises parameters including an anti-windup parameter associated with tuning of the PID controller, wherein the anti-windup parameter functions in discrete-time by feeding into a control signal using a scaled sum of past deviations of an actuator signal from an unsaturated signal with a nonnegative scaling constant, to redefine the PID controller; and tuning a controller including the PID controller comprising a neural network embedded in the actor-critic framework, wherein the tuning of the controller comprises reinforcement learning based tuning including anti-windup tuning. 9. The system of claim 8 wherein the controller allows for constraining of individual parameters. 10. The system of claim 8 wherein the instructions are further configured for initializing the actor network with gains, which are already in use or known to be stabilizing. 11. The system of claim 8 wherein the instructions are further configured for initializing the weights associated with the actor with selected PID gains. 12. A method for reinforcement learning, the method comprising: providing an actor-critic framework comprising an actor and a critic, the actor comprising an actor network including a proportional integral derivative (PID) controller having PID gains, wherein the PID gains are weights of the actor network, the weights of the actor network initialize the PID gains on each parameter of the PID controller, and the critic comprises a critic network including at least one function associated with the actor, and wherein the PID controller comprises parameters including an anti-windup parameter associated with tuning of the PID controller, wherein the anti-windup parameter functions in discrete-time by feeding into a control signal using a scaled sum of past deviations of an actuator signal from an unsaturated signal with a nonnegative scaling constant, to redefine the PID controller; and tuning a controller comprising a neural network embedded in the actor-critic framework, wherein the tuning of the controller comprises reinforcement learning based tuning including anti-windup tuning. 13. The method of claim 12 wherein the controller allows for constraining of individual parameters. 14. The method of claim 12 further comprising initializing the actor network with gains that are already in use or known to be stabilizing. 15. The method of claim 12 further comprising initializing the weights associated with the actor with selected PID gains. 16. The method of claim 12 further comprising: initializing the actor network with gains that are already in use or known to be stabilizing; and initializing the weights associated with the actor with selected PID gains. 17. The method of claim 16 wherein the PID controller comprises a PD (Proportional-Derivative) portion. 18. The method of claim 16 wherein the PID controller comprises an integral portion. 19. The method of claim 16 wherein the PID controller comprises a PD (Proportional-Derivative) portion and an integral portion. 20. The method of claim 16 wherein the controller comprises a deep reinforcement learning (DRL) controller, wherein reinforcement learning (RCL) is used to train the weights without a process model.

Assignees

Inventors

Classifications

  • using neural networks only · CPC title

  • the criterion being a learning criterion · CPC title

  • G05B6/02Primary

    electric · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11500337B2 cover?
A method and system for reinforcement learning can include an actor-critic framework comprising an actor and a critic, the actor comprising an actor network and the critic comprising a critic network; and a controller comprising a neural network embedded in the actor-critic framework and which can be tuned according to reinforcement learning based tuning including anti-windup tuning.
Who is the assignee on this patent?
Honeywell Int Inc
What technology area does this patent fall under?
Primary CPC classification G05B13/0265. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).