Information processing apparatus, and information processing method, and program

US2019332951A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2019332951-A1
Application numberUS-201716475540-A
CountryUS
Kind codeA1
Filing dateNov 9, 2017
Priority dateFeb 15, 2017
Publication dateOct 31, 2019
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided are an apparatus and a method enabling efficient reinforcement learning to be performed by input of an annotation. Included are a database configured to store respective pieces of information of a state, an action, and a reward of a processing execution unit, a learning execution unit configured to execute learning processing in accordance with a reinforcement learning algorithm to which the information stored in the database is applied, and an annotation input unit configured to input annotation information including sub reward setting information and store the annotation information in the database. The learning execution unit executes learning processing to which the respective pieces of information of the state, the action, and the reward input from the processing execution unit and the sub reward setting information are applied, derives an action determination rule, and determines an action which is caused to be executed in accordance with the action determination rule.

First claim

Opening claim text (preview).

1 . An information processing apparatus comprising: a database configured to store respective pieces of information of a state, an action, and a reward of a processing execution unit; a learning execution unit configured to execute learning processing in accordance with a reinforcement learning algorithm to which the respective pieces of information of the state, the action, and the reward stored in the database are applied; and an annotation input unit configured to input annotation information including sub reward setting information and store the annotation information in the database, wherein the learning execution unit executes learning processing to which the respective pieces of information of the state, the action, and the reward input from the processing execution unit and the sub reward setting information input via the annotation input unit are applied. 2 . The information processing apparatus according to claim 1 , wherein the learning execution unit derives an action determination rule for estimating an action to be executed to raise an expected reward by the learning processing. 3 . The information processing apparatus according to claim 1 , further comprising: an action determination unit configured to determine an action which the processing execution unit is caused to execute in accordance with the action determination rule. 4 . The information processing apparatus according to claim 1 , further comprising: a data input unit configured to input the respective pieces of information of the state, the action, and the reward input from the processing execution unit, wherein the database stores input data of the data input unit and stores the sub reward setting information input via the annotation input unit. 5 . The information processing apparatus according to claim 1 , wherein the annotation input unit inputs the annotation information including the sub reward setting information input via an annotation input apparatus enabling input processing at an arbitrary time to be performed by a user and stores the annotation information in the database. 6 . The information processing apparatus according to claim 1 , further comprising: a control unit configured to store the respective pieces of information of the state and the action of the processing execution unit at time of input of the annotation in the database in association with the sub reward setting information included in the annotation. 7 . The information processing apparatus according to claim 6 , wherein the learning execution unit executes learning processing to which the respective pieces of information of the state, the action, and the reward input from the processing execution unit and respective pieces of information of a state, an action, and a sub reward stored in the database in association with the sub reward setting information input via the annotation input unit are applied. 8 . The information processing apparatus according to claim 1 , wherein the sub reward setting information input via the annotation input unit is information input by a user observing processing that the processing execution unit executes. 9 . The information processing apparatus according to claim 1 , wherein the sub reward setting information input via the annotation input unit is information input by a user controlling processing that the processing execution unit executes. 10 . The information processing apparatus according to claim 1 , wherein the sub reward setting information input via the annotation input unit is reward setting information which is input by a user observing processing that the processing execution unit executes and includes a positive reward value input by the user that has confirmed that the processing that the processing execution unit executes is correct. 11 . The information processing apparatus according to claim 1 , wherein the sub reward setting information input via the annotation input unit is reward setting information which is input by a user observing processing that the processing execution unit executes and includes a negative reward value input by the user that has confirmed that the processing that the processing execution unit executes is not correct. 12 . The information processing apparatus according to claim 1 , wherein the processing execution unit is an independent apparatus different from the information processing apparatus, and the information processing apparatus performs data transmission and reception by communication processing with the processing execution unit and controls the processing execution unit. 13 . The information processing apparatus according to claim 1 , wherein the annotation input unit is configured to input the annotation information input by an independent annotation input apparatus different from the information processing apparatus. 14 . An information processing method executed in an information processing apparatus, the information processing apparatus comprising: a database configured to store respective pieces of information of a state, an action, and a reward of a processing execution unit; a learning execution unit configured to execute learning processing in accordance with a reinforcement learning algorithm to which the respective pieces of information of the state, the action, and the reward stored in the database are applied; and an annotation input unit configured to input annotation information including sub reward setting information and store the annotation information in the database, wherein the learning execution unit executes learning processing to which the respective pieces of information of the state, the action, and the reward input from the processing execution unit and the sub reward setting information input via the annotation input unit are applied. 15 . A program causing information processing to be executed in an information processing apparatus, the information processing apparatus including: a database configured to store respective pieces of information of a state, an action, and a reward of a processing execution unit; a learning execution unit configured to execute learning processing in accordance with a reinforcement learning algorithm to which the respective pieces of information of the state, the action, and the reward stored in the database are applied; and an annotation input unit configured to input annotation information including sub reward setting information and store the annotation information in the database, wherein the program causes the learning execution unit to execute learning processing to which the respective pieces of information of the state, the action, and the reward input from the processing execution unit and the sub reward setting information input via the annotation input unit are applied.

Assignees

Inventors

Classifications

  • G06N5/025Primary

    Extracting rules from data · CPC title

  • automatically for the purpose of assisting the player, e.g. automatic braking in a driving game · CPC title

  • based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title

  • Machine learning · CPC title

  • Computing the game score · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2019332951A1 cover?
Provided are an apparatus and a method enabling efficient reinforcement learning to be performed by input of an annotation. Included are a database configured to store respective pieces of information of a state, an action, and a reward of a processing execution unit, a learning execution unit configured to execute learning processing in accordance with a reinforcement learning algorithm to whi…
Who is the assignee on this patent?
Sony Corp
What technology area does this patent fall under?
Primary CPC classification G06N5/025. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Oct 31 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).