Rule-based deconfliction of overlapping data
US-2024185097-A1 · Jun 6, 2024 · US
US2019332951A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2019332951-A1 |
| Application number | US-201716475540-A |
| Country | US |
| Kind code | A1 |
| Filing date | Nov 9, 2017 |
| Priority date | Feb 15, 2017 |
| Publication date | Oct 31, 2019 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Provided are an apparatus and a method enabling efficient reinforcement learning to be performed by input of an annotation. Included are a database configured to store respective pieces of information of a state, an action, and a reward of a processing execution unit, a learning execution unit configured to execute learning processing in accordance with a reinforcement learning algorithm to which the information stored in the database is applied, and an annotation input unit configured to input annotation information including sub reward setting information and store the annotation information in the database. The learning execution unit executes learning processing to which the respective pieces of information of the state, the action, and the reward input from the processing execution unit and the sub reward setting information are applied, derives an action determination rule, and determines an action which is caused to be executed in accordance with the action determination rule.
Opening claim text (preview).
1 . An information processing apparatus comprising: a database configured to store respective pieces of information of a state, an action, and a reward of a processing execution unit; a learning execution unit configured to execute learning processing in accordance with a reinforcement learning algorithm to which the respective pieces of information of the state, the action, and the reward stored in the database are applied; and an annotation input unit configured to input annotation information including sub reward setting information and store the annotation information in the database, wherein the learning execution unit executes learning processing to which the respective pieces of information of the state, the action, and the reward input from the processing execution unit and the sub reward setting information input via the annotation input unit are applied. 2 . The information processing apparatus according to claim 1 , wherein the learning execution unit derives an action determination rule for estimating an action to be executed to raise an expected reward by the learning processing. 3 . The information processing apparatus according to claim 1 , further comprising: an action determination unit configured to determine an action which the processing execution unit is caused to execute in accordance with the action determination rule. 4 . The information processing apparatus according to claim 1 , further comprising: a data input unit configured to input the respective pieces of information of the state, the action, and the reward input from the processing execution unit, wherein the database stores input data of the data input unit and stores the sub reward setting information input via the annotation input unit. 5 . The information processing apparatus according to claim 1 , wherein the annotation input unit inputs the annotation information including the sub reward setting information input via an annotation input apparatus enabling input processing at an arbitrary time to be performed by a user and stores the annotation information in the database. 6 . The information processing apparatus according to claim 1 , further comprising: a control unit configured to store the respective pieces of information of the state and the action of the processing execution unit at time of input of the annotation in the database in association with the sub reward setting information included in the annotation. 7 . The information processing apparatus according to claim 6 , wherein the learning execution unit executes learning processing to which the respective pieces of information of the state, the action, and the reward input from the processing execution unit and respective pieces of information of a state, an action, and a sub reward stored in the database in association with the sub reward setting information input via the annotation input unit are applied. 8 . The information processing apparatus according to claim 1 , wherein the sub reward setting information input via the annotation input unit is information input by a user observing processing that the processing execution unit executes. 9 . The information processing apparatus according to claim 1 , wherein the sub reward setting information input via the annotation input unit is information input by a user controlling processing that the processing execution unit executes. 10 . The information processing apparatus according to claim 1 , wherein the sub reward setting information input via the annotation input unit is reward setting information which is input by a user observing processing that the processing execution unit executes and includes a positive reward value input by the user that has confirmed that the processing that the processing execution unit executes is correct. 11 . The information processing apparatus according to claim 1 , wherein the sub reward setting information input via the annotation input unit is reward setting information which is input by a user observing processing that the processing execution unit executes and includes a negative reward value input by the user that has confirmed that the processing that the processing execution unit executes is not correct. 12 . The information processing apparatus according to claim 1 , wherein the processing execution unit is an independent apparatus different from the information processing apparatus, and the information processing apparatus performs data transmission and reception by communication processing with the processing execution unit and controls the processing execution unit. 13 . The information processing apparatus according to claim 1 , wherein the annotation input unit is configured to input the annotation information input by an independent annotation input apparatus different from the information processing apparatus. 14 . An information processing method executed in an information processing apparatus, the information processing apparatus comprising: a database configured to store respective pieces of information of a state, an action, and a reward of a processing execution unit; a learning execution unit configured to execute learning processing in accordance with a reinforcement learning algorithm to which the respective pieces of information of the state, the action, and the reward stored in the database are applied; and an annotation input unit configured to input annotation information including sub reward setting information and store the annotation information in the database, wherein the learning execution unit executes learning processing to which the respective pieces of information of the state, the action, and the reward input from the processing execution unit and the sub reward setting information input via the annotation input unit are applied. 15 . A program causing information processing to be executed in an information processing apparatus, the information processing apparatus including: a database configured to store respective pieces of information of a state, an action, and a reward of a processing execution unit; a learning execution unit configured to execute learning processing in accordance with a reinforcement learning algorithm to which the respective pieces of information of the state, the action, and the reward stored in the database are applied; and an annotation input unit configured to input annotation information including sub reward setting information and store the annotation information in the database, wherein the program causes the learning execution unit to execute learning processing to which the respective pieces of information of the state, the action, and the reward input from the processing execution unit and the sub reward setting information input via the annotation input unit are applied.
Extracting rules from data · CPC title
automatically for the purpose of assisting the player, e.g. automatic braking in a driving game · CPC title
based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title
Machine learning · CPC title
Computing the game score · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.