Apparatus and methods for control of robot actions based on corrective user inputs

US9358685B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9358685-B2
Application numberUS-201414171762-A
CountryUS
Kind codeB2
Filing dateFeb 3, 2014
Priority dateFeb 3, 2014
Publication dateJun 7, 2016
Grant dateJun 7, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Robots have the capacity to perform a broad range of useful tasks, such as factory automation, cleaning, delivery, assistive care, environmental monitoring and entertainment. Enabling a robot to perform a new task in a new environment typically requires a large amount of new software to be written, often by a team of experts. It would be valuable if future technology could empower people, who may have limited or no understanding of software coding, to train robots to perform custom tasks. Some implementations of the present invention provide methods and systems that respond to users' corrective commands to generate and refine a policy for determining appropriate actions based on sensor-data input. Upon completion of learning, the system can generate control commands by deriving them from the sensory data. Using the learned control policy, the robot can behave autonomously.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for controlling actions of robots, the method comprising: identifying, at a device that includes a processor, a first context-variable value for a context variable detected by a robot at a first sensory-detection time; accessing, at the device, a policy comprising one or more parameters configured to map the context variable to a robot action variable; determining that a first robot action characterized by a first value of the robot action variable was performed at an action time in response to detection of the first context-variable value, the first robot action being in accordance with application of the policy; determining that a user input was received at an input time corresponding to the action time, wherein user input data derived from the user input at least partly defines a corrective command that specifies a corrective robot action for physical performance, the user input being indicative of at least partial dissatisfaction with the first robot action; modifying the policy based on the corrective command and the first context-variable value; and causing the modified policy to be used to: determine a second robot action characterized by a second value of the robot action variable based on a second context-variable value for the context variable detected at a second sensory-detection time; and initiate performance of the second robot action in accordance with the second value of the robot action variable. 2. The method of claim 1 , further comprising: identifying a third context-variable value for the context variable, the third context-variable value being detected at a third sensory-detection time that is after the second sensory-detection time; determining that the robot performed a third action in response to the third context-variable value, the third action being in accordance with application of the accessed policy; and inferring that the third action was satisfactory based on a lack of input data east partly defining a corrective command corresponding to the third action; wherein the modification of the policy is further based on the third context-variable value. 3. The method of claim 1 , further comprising: identifying initial user input data derived from an initial user input received, the initial user input data at least partly defining an initial command that specifies an initial robot action for the robot to physically perform; identifying an initial context-variable value for the context variable detected by the robot at an initial sensory-detection time that corresponds to an initial input time; and determining the accessed policy based on the initial command and the first context-variable value for the context variable. 4. The method of claim 1 , further comprising: determining the first value of the robot action variable based on the first context-variable value for the context variable; and initiating the first robot action in accordance with the first value of the robot action variable. 5. The method of claim 1 , wherein the modifying of the policy further comprises using a learning model. 6. The method of claim 1 , wherein the corrective command is indicative of a magnitude of action. 7. The method of claim 1 , wherein the robot includes the device and further includes a motor used to perform at least part of the first robot action or the second robot action. 8. The method of claim 1 , wherein the user input includes input received at an interface at a user device remote from the robot. 9. A system, comprising: one or more data processors; and a non-transitory computer readable storage medium containing instructions which when executed on the one or more data processors, cause the processor to: identify a first context-variable value for a context variable detected by a robot at a first sensory-detection time; access a policy comprising one or more parameters configured to map the context variable to a robot action variable; determine that a first robot action characterized by a first value of the robot action variable was performed at an action time in response to detection of the first context-variable value, the first robot action being in accordance with application of the policy; determine that a user input was received at an input time configured to correspond to the action time, wherein user input data derived from the user input at least partly defines a corrective command that specifies a corrective robot action for physical performance, the user input being indicative of at least partial dissatisfaction with the first robot action, wherein the corrective command defined by the user input data is configured to minimize an error associated with the robot action; modify the policy based on the corrective command and the st context-variable value; and cause the modified policy to be used to: determine a second robot action characterized by a second value of the robot action variable based on a second context-variable value for the context variable detected at a second sensory-detection time; and initiate performance of the second robot action in accordance with the second value of the robot action variable. 10. The system of claim 9 , wherein the instructions further cause the processor to: identify a third context-variable value for the context variable, the third context-variable value being detected at a third sensory-detection time that is after the second sensory-detection time; determine that the robot performed a third action in response to the third context-variable value, the third action being in accordance with application of the accessed policy; and infer that the third action was satisfactory based on a lack of input data least partly defining a corrective command corresponding to the third action; wherein the modification of the policy is further based on the third context-variable value. 11. The system of claim 9 , wherein the instructions further cause the processor to: identify initial user input data derived from an initial user input received, the initial user input data at least partly defining an initial command that specifies an initial robot action for the robot to physically perform; identify an initial context-variable value for the context variable detected by the robot at an initial sensory-detection time that corresponds to the initial input time; and determine the accessed policy based on the initial command and the first context-variable value for the context variable. 12. The system of claim 9 , wherein the instructions further cause the processor to: determine the first value of the robot action variable based on the first context-variable value for the context variable; and initiate the first robot action in accordance with the first value of the robot action variable. 13. The system of claim 9 , wherein the policy is configured to be modified by use of a learning model. 14. The system of claim 9 , wherein the corrective command is indicative of a magnitude of action. 15. The system of claim 9 , wherein the robot includes the system and further includes a motor used to perform at least part of the first robot action or the second robot action. 16. The system of claim 9 , wherein the user input includes input received at an interface at a user device remote from the system. 17. A computer-program product tangibly embodied in a non-transitory machine-readable storage medium, including instructions configured to cause one or more data processors to: identify a first context-variable value for a context variable detected by a robo

Assignees

Inventors

Classifications

  • Learn by operator observation, symbiosis, show, watch · CPC title

  • using neural networks only · CPC title

  • Sensing device · CPC title

  • based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour · CPC title

  • B25J9/1602Primary

    characterised by the control system, structure, architecture · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9358685B2 cover?
Robots have the capacity to perform a broad range of useful tasks, such as factory automation, cleaning, delivery, assistive care, environmental monitoring and entertainment. Enabling a robot to perform a new task in a new environment typically requires a large amount of new software to be written, often by a team of experts. It would be valuable if future technology could empower people, who m…
Who is the assignee on this patent?
Brain Corp
What technology area does this patent fall under?
Primary CPC classification B25J9/1602. Mapped technology areas include Operations & Transport.
When was this patent published?
Publication date Tue Jun 07 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).