Data-efficient reinforcement learning for continuous control tasks
US-2019354813-A1 · Nov 21, 2019 · US
US11500337B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11500337-B2 |
| Application number | US-201916673901-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 4, 2019 |
| Priority date | Nov 4, 2019 |
| Publication date | Nov 15, 2022 |
| Grant date | Nov 15, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method and system for reinforcement learning can include an actor-critic framework comprising an actor and a critic, the actor comprising an actor network and the critic comprising a critic network; and a controller comprising a neural network embedded in the actor-critic framework and which can be tuned according to reinforcement learning based tuning including anti-windup tuning.
Opening claim text (preview).
What is claimed is: 1. A system for reinforcement learning, the system comprising: an actor-critic framework comprising an actor and a critic, the actor comprising an actor network including a proportional integral derivative (PID) controller having PID gains, wherein the PID gains are weights of the actor network, the weights of the actor network initialize the PID gains on each parameter of the PID controller, and the critic comprises a critic network including at least one function associated with the actor, wherein the PID controller comprises parameters including an anti-windup parameter associated with tuning of the PID controller, wherein the anti-windup parameter functions in discrete-time by feeding into a control signal using a scaled sum of past deviations of an actuator signal from an unsaturated signal with a nonnegative scaling constant, to redefine the PID controller; and a controller including the PID controller comprising a neural network embedded in the actor-critic framework and which is tuned according to reinforcement learning based tuning including anti-windup tuning. 2. The system of claim 1 wherein the controller allows for constraining of individual parameters. 3. The system of claim 1 wherein the actor network is initialized with gains, which are already in use or known to be stabilizing. 4. The system of claim 1 wherein the weights associated with the actor are initialized with selected PID gains. 5. The system of claim 1 wherein the PID controller comprises a (Proportional-Derivative) portion. 6. The system of claim 1 wherein the PID controller comprises an integral portion. 7. The system of claim 1 wherein the PID controller comprises a PD (Proportional-Derivative) portion and an integral portion. 8. A system for reinforcement learning, the system comprising: at least one processor; and a non-transitory computer-usable medium embodying computer program code, said non-transitory computer-usable medium capable of communicating with said at least one processor, said computer program code comprising instructions executable by said at least one processor and configured for: providing an actor-critic framework comprising an actor and a critic, the actor comprising an actor network including a proportional integral derivative (PID) controller having PID gains, wherein the PID gains are weights of the actor network, the weights of the actor network initialize the PID gains on each parameter of the PID controller, and the critic comprises a critic network including at least one function associated with the actor, wherein the PID controller comprises parameters including an anti-windup parameter associated with tuning of the PID controller, wherein the anti-windup parameter functions in discrete-time by feeding into a control signal using a scaled sum of past deviations of an actuator signal from an unsaturated signal with a nonnegative scaling constant, to redefine the PID controller; and tuning a controller including the PID controller comprising a neural network embedded in the actor-critic framework, wherein the tuning of the controller comprises reinforcement learning based tuning including anti-windup tuning. 9. The system of claim 8 wherein the controller allows for constraining of individual parameters. 10. The system of claim 8 wherein the instructions are further configured for initializing the actor network with gains, which are already in use or known to be stabilizing. 11. The system of claim 8 wherein the instructions are further configured for initializing the weights associated with the actor with selected PID gains. 12. A method for reinforcement learning, the method comprising: providing an actor-critic framework comprising an actor and a critic, the actor comprising an actor network including a proportional integral derivative (PID) controller having PID gains, wherein the PID gains are weights of the actor network, the weights of the actor network initialize the PID gains on each parameter of the PID controller, and the critic comprises a critic network including at least one function associated with the actor, and wherein the PID controller comprises parameters including an anti-windup parameter associated with tuning of the PID controller, wherein the anti-windup parameter functions in discrete-time by feeding into a control signal using a scaled sum of past deviations of an actuator signal from an unsaturated signal with a nonnegative scaling constant, to redefine the PID controller; and tuning a controller comprising a neural network embedded in the actor-critic framework, wherein the tuning of the controller comprises reinforcement learning based tuning including anti-windup tuning. 13. The method of claim 12 wherein the controller allows for constraining of individual parameters. 14. The method of claim 12 further comprising initializing the actor network with gains that are already in use or known to be stabilizing. 15. The method of claim 12 further comprising initializing the weights associated with the actor with selected PID gains. 16. The method of claim 12 further comprising: initializing the actor network with gains that are already in use or known to be stabilizing; and initializing the weights associated with the actor with selected PID gains. 17. The method of claim 16 wherein the PID controller comprises a PD (Proportional-Derivative) portion. 18. The method of claim 16 wherein the PID controller comprises an integral portion. 19. The method of claim 16 wherein the PID controller comprises a PD (Proportional-Derivative) portion and an integral portion. 20. The method of claim 16 wherein the controller comprises a deep reinforcement learning (DRL) controller, wherein reinforcement learning (RCL) is used to train the weights without a process model.
using neural networks only · CPC title
the criterion being a learning criterion · CPC title
electric · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.