Apparatus and methods for control of robot actions based on corrective user inputs

US9789605B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9789605-B2
Application numberUS-201615174858-A
CountryUS
Kind codeB2
Filing dateJun 6, 2016
Priority dateFeb 3, 2014
Publication dateOct 17, 2017
Grant dateOct 17, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Robots have the capacity to perform a broad range of useful tasks, such as factory automation, cleaning, delivery, assistive care, environmental monitoring and entertainment. Enabling a robot to perform a new task in a new environment typically requires a large amount of new software to be written, often by a team of experts. It would be valuable if future technology could empower people, who may have limited or no understanding of software coding, to train robots to perform custom tasks. Some implementations of the present invention provide methods and systems that respond to users' corrective commands to generate and refine a policy for determining appropriate actions based on sensor-data input. Upon completion of learning, the system can generate control commands by deriving them from the sensory data. Using the learned control policy, the robot can behave autonomously.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for performing robot actions by a robot, the method comprising: defining a policy comprising a plurality of parameters for determining robot actions based at least in part on sensory-data inputs, the defining of the policy comprising mapping the sensory-data inputs to robot actions; receiving a first sensory-data input from a sensor; performing a first robot action at a first action time, wherein the first robot action is determined based at least in part on the first sensory-data input and application of the policy; determining that a user input was received at an input time corresponding to the first action time, wherein a corrective command at least partially derived from the user input specifies a corrective robot action for physical performance, the user input being indicative of at least partial dissatisfaction with the first robot action; and modifying the policy based on the corrective command and the first sensory-data input. 2. The method of claim 1 , further comprising determining a second robot action at a second action time, wherein the second robot action is based at least in part on the modified policy and a second sensory-data input from the sensor. 3. The method of claim 1 , wherein the modifying of the policy further comprises using a learning model. 4. The method of claim 1 , wherein the at least partial dissatisfaction includes a discrepancy between a target robot action and the first robot action. 5. The method of claim 1 , wherein the modifying of the policy comprises changing parameters relating sensory-data inputs to actuator responses that correspond to robot actions. 6. The method of claim 3 , wherein the learning model includes updating parameters based on a gradient of error determined at least in part by a difference between the first robot action and a second robot action specified by a combination of the corrective command and the policy. 7. The method of claim 1 , further comprising determining a first context-variable value for a context variable, wherein the first context-variable value is determined from the first sensory-data input and the policy is further determined based at least in part on the context variable. 8. A robot, comprising: an actuator configured to perform robot actions for robotic tasks; a sensor configured to detect an environmental context of the robot and generate sensory-data inputs; and a processor apparatus configured to: define a policy comprising a plurality of parameters configured to determine robot actions based at least in part on sensory-data inputs; determine that a user input was received at an input time corresponding to a performance of a first robot action corresponding to a detection of a first sensory-data input; generate a corrective command at least partially derived from the user input, the user input being indicative of at least partial dissatisfaction with the first robot action, and modify the policy based on the corrective command and the first sensory-data input. 9. The robot of claim 8 , further comprising a user interface configured to receive the user input. 10. The robot of claim 8 , wherein the at least partial dissatisfaction includes a discrepancy between a target robot action and the first robot action. 11. The robot of claim 8 , wherein the modification of the policy further comprises usage of a learning model. 12. The robot of claim 8 , wherein the processor apparatus is further configured to determine a first context-variable value for a context variable, wherein the first context-variable value is determined from the first sensory-data input and the policy is further determined based at least in part on the context variable. 13. The robot of claim 8 , wherein the sensor is at least one of a light sensor, a motion detector, an inertial measurement unit, and a global positioning system receiver. 14. A non-transitory computer-readable storage medium having a plurality of instructions stored thereon, the instructions being executable by a processing apparatus to operate a robot, the instructions configured to, when executed by the processing apparatus, cause the processing apparatus to: define a policy comprising a plurality of parameters configured to determine robot actions based at least in part on sensory-data inputs, wherein the policy maps the sensory-data inputs to robot actions; receive a first sensory-data input; perform a first robot action at a first action time, wherein the first action is determined based at least in part on the first sensory-data input and application of the policy; determine that a user input was received at an input time corresponding to the first action time, wherein a corrective command at least partially derived from the user input specifies a corrective robot action for physical performance, the user input being indicative of at least partial dissatisfaction with the first robot action; and modify the policy based on the corrective command and the first sensory-data input. 15. The non-transitory computer-readable storage medium of claim 14 , wherein the instructions are further configured to, when executed by the processing apparatus, determine a second robot action at a second action time, wherein the second robot action is based at least in part on the modified policy and a second sensory-data input. 16. The non-transitory computer-readable storage medium of claim 14 , wherein the modification of the policy further comprises usage of a learning model. 17. The non-transitory computer-readable storage medium of claim 14 , wherein the instructions are further configured to, when executed by the processing apparatus, assess whether the modified policy comprises an improvement over the policy prior to modification, the improvement being determined by a threshold being exceeded. 18. The non-transitory computer-readable storage medium of claim 14 , wherein the modification of the policy comprises changing parameters relating sensory-data inputs to actuator responses that correspond to robot actions. 19. The non-transitory computer-readable storage medium of claim 16 , wherein the learning model includes updating parameters based on a gradient of error determined at least in part by a difference between the first robot action and a second robot action specified by a combination of the corrective command and the policy. 20. The non-transitory computer-readable storage medium of claim 14 , wherein the instructions are further configured to, when executed by the processing apparatus, determine a first context-variable value for a context variable, wherein the first context-variable value is determined from the first sensory-data input and the policy is further determined based at least in part on the context variable. 21. The non-transitory computer-readable storage medium of claim 14 , wherein the at least partial dissatisfaction includes a discrepancy between a target robot action and the first robot action.

Assignees

Inventors

Classifications

  • using neural networks only · CPC title

  • characterised by programming, planning systems for manipulators · CPC title

  • B25J9/163Primary

    learning, adaptive, model based, rule based expert control · CPC title

  • Sensing device · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9789605B2 cover?
Robots have the capacity to perform a broad range of useful tasks, such as factory automation, cleaning, delivery, assistive care, environmental monitoring and entertainment. Enabling a robot to perform a new task in a new environment typically requires a large amount of new software to be written, often by a team of experts. It would be valuable if future technology could empower people, who m…
Who is the assignee on this patent?
Brain Corp
What technology area does this patent fall under?
Primary CPC classification B25J9/163. Mapped technology areas include Operations & Transport.
When was this patent published?
Publication date Tue Oct 17 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).