Rule creation using MDP and inverse reinforcement learning

US11003998B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11003998-B2
Application numberUS-201715812002-A
CountryUS
Kind codeB2
Filing dateNov 14, 2017
Priority dateApr 11, 2017
Publication dateMay 11, 2021
Grant dateMay 11, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method is provided for rule creation that includes receiving (i) a MDP model with a set of states, a set of actions, and a set of transition probabilities, (ii) a policy that corresponds to rules for a rule engine, and (iii) a set of candidate states that can be added to the set of states. The method includes transforming the MDP model to include a reward function using an inverse reinforcement learning process on the MDP model and on the policy. The method includes finding a state from the candidate states, and generating a refined MDP model with the reward function by updating the transition probabilities related to the state. The method includes obtaining an optimal policy for the refined MDP model with the reward function, based on the reward policy, the state, and the updated probabilities. The method includes updating the rule engine based on the optimal policy.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer program product for rule creation, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform a method comprising: receiving (i) a Markov Decision Process (MDP) model with a set of states, a set of actions, and a set of transition probabilities, (ii) a policy that corresponds to rules for a rule engine, and (iii) a set of candidate states that can be added to the set of states, wherein the candidate states in the set of candidate states are determined from a merged state formed by merging two or more different states in the set of states, and wherein the set of transition probabilities relate to the set of states and the set of candidate states; transforming the MDP model to include a reward function using an inverse reinforcement learning process on the MDP model and on the policy; finding a new state for the set of states from the set of candidate states based on predetermined criteria; generating a refined MDP model with the reward function by updating any of the transition probabilities related to the new state; obtaining an optimal policy for the refined MDP model with the reward function, based on a reward policy, the state, and the updated transition probabilities related to the new state; and updating the rule engine based on the optimal policy. 2. The computer program product of claim 1 , wherein said receiving step receives the MDP model further with a discount factor and an initial-state distribution. 3. The computer program product of claim 2 , wherein the discount factor comprises a constant having a value between zero and one. 4. The computer program product of claim 1 , wherein the reward function comprises one or more weights corresponding to respective bits in the reward function. 5. The computer program product of claim 4 , wherein the new state from the set of candidate states is found based on the one or more weights. 6. The computer program product of claim 1 , wherein each of the states in the set of states is associated with a K-dimensional bit feature vector. 7. The computer program product of claim 1 , wherein the predetermined criteria comprises the new state having a value equal to argmax_s {w*φ(s)|s in the candidate state set}, wherein s is the state, w* is a K-dimensional real-valued vector, and φ(s) is a K-dimensional bit feature vector. 8. The computer program product of claim 1 , wherein the predetermined criteria comprises finding the new state from the set of candidate states such that the optimal policy most differs from the policy of said receiving step. 9. The computer program product of claim 1 , wherein the policy is emulated by the reward function, and wherein the reward function is equal to w*φ(s), where w* is a K-dimensional real-valued vector, and where φ(s) is a K-dimensional bit feature vector. 10. The computer program product of claim 1 , wherein the optimal policy is obtained by applying a value iteration and policy iteration process to the reward policy, the state, and the updated transition probabilities related to the new state. 11. The computer program product of claim 1 , wherein the method further comprises applying the updated rule engine to an input signal representative of a current state of a particular object in order to change the current state of the particular object to another state. 12. The computer program product of claim 11 , wherein said applying step comprises controlling a function affecting a motion of a vehicle. 13. The computer program product of claim 1 , wherein the method further comprises adding the new state from the set of candidate states to the refined MDP model. 14. A computer processing system configured to perform rule creation, comprising: a processor configured to: receive (i) a Markov Decision Process (MDP) model with a set of states, a set of actions, and a set of transition probabilities, (ii) a policy that corresponds to rules for a rule engine, and (iii) a set of candidate states that can be added to the set of states, wherein the candidate states in the set of candidate states are determined from a merged state formed by merging two or more different states in the set of states, and wherein the set of transition probabilities relate to the set of states and the set of candidate states; transform the MDP model to include a reward function using an inverse reinforcement learning process on the MDP model and on the policy; find a new state for the set of states from the set of candidate states based on predetermined criteria; generate a refined MDP model with the reward function by updating any of the transition probabilities related to the new state; obtain an optimal policy for the refined MDP model with the reward function, based on reward policy, the state, and the updated transition probabilities related to the new state; and update the rule engine based on the optimal policy.

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Distributed expert systems; Blackboards · CPC title

  • Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • Machine learning · CPC title

  • Inference or reasoning models · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11003998B2 cover?
A method is provided for rule creation that includes receiving (i) a MDP model with a set of states, a set of actions, and a set of transition probabilities, (ii) a policy that corresponds to rules for a rule engine, and (iii) a set of candidate states that can be added to the set of states. The method includes transforming the MDP model to include a reward function using an inverse reinforceme…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N5/025. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 11 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).