Deep machine learning methods and apparatus for robotic grasping

US11045949B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11045949-B2
Application numberUS-202016823947-A
CountryUS
Kind codeB2
Filing dateMar 19, 2020
Priority dateMar 3, 2016
Publication dateJun 29, 2021
Grant dateJun 29, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Deep machine learning methods and apparatus related to manipulation of an object by an end effector of a robot. Some implementations relate to training a semantic grasping model to predict a measure that indicates whether motion data for an end effector of a robot will result in a successful grasp of an object; and to predict an additional measure that indicates whether the object has desired semantic feature(s). Some implementations are directed to utilization of the trained semantic grasping model to servo a grasping end effector of a robot to achieve a successful grasp of an object having desired semantic feature(s).

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: identifying, by one or more processors, a plurality of training examples generated based on sensor output from one or more robots during a plurality of grasp attempts by the robots, each of the training examples including training example input comprising: an image for a corresponding instance of time of a corresponding grasp attempt of the grasp attempts, the image capturing a robotic end effector and one or more environmental objects at the corresponding instance of time, an end effector motion vector defining motion of the end effector to move from an instance of time pose of the end effector at the corresponding instance of time to a final pose of the end effector for the corresponding grasp attempt, and each of the training examples including training example output comprising: at least one grasped object label indicating a semantic feature of an object grasped by the corresponding grasp attempt; and training, by one or more of the processors, a convolutional neural network based on the training examples, wherein training the convolutional neural network based on the training examples comprises performing instances of backpropagation, on the convolutional neural network, that are based on the training examples. 2. The method of claim 1 , wherein training the convolutional neural network based on the training examples comprises: applying, to an additional convolutional neural network, the training example input of a given training example of the training examples; generating, over the additional convolutional neural network based on the training example input of the given training example, spatial transformer network parameters; using the spatial transformer network parameters to generate a spatial transformation of the image of the given training example; generating output over the convolutional neural network based on the spatial transformation of the image; and updating the convolutional neural network based on the output and the training example output of the given training example. 3. The method of claim 1 , further comprising: training the convolutional neural network based on additional training examples that are not generated based on grasp attempts. 4. The method of claim 1 , further comprising: training an additional convolutional neural network based on the training examples. 5. The method of claim 4 , wherein training the additional convolutional neural network based on the training examples comprises: generating, over the additional convolutional neural network based on applying the training example input of the given training example to the additional convolutional neural network, a predicted grasp measure; and updating the additional convolutional neural network based on the predicted grasp measure and the training example output of the given training example. 6. The method of claim 1 , wherein the training examples comprise: a first group of the training examples generated based on output from a plurality of first robot sensors of a first robot during a plurality of the grasp attempts by the first robot; and a second group of the training examples generated based on output from a plurality of second robot sensors of a second robot during a plurality of the grasp attempts by the second robot. 7. The method of claim 6 , wherein the first robot sensors comprise a first vision sensor generating the images for the training examples of the first group, wherein the second robot sensors comprise a second vision sensor generating the images for the training examples of the second group, and wherein a first pose of the first vision sensor relative to a first base of the first robot is distinct from a second pose of the second vision sensor relative to a second base of the second robot. 8. The method of claim 1 , wherein the grasp attempts on which a plurality of training examples are based each comprise a plurality of random actuator commands that randomly move the end effector from a starting pose of the end effector to the final pose of the end effector, then grasp with the end effector at the final pose. 9. A robot, comprising: a vision sensor viewing an environment of the robot; an end effector; actuators that control a pose of the end effector; a user interface input device; one or more deep neural networks stored in one or more non-transitory computer readable media; at least one processor configured to: receive, via the user interface input device, user interface input of a user; identify, based on the user interface input, a desired object semantic feature for a grasp attempt; generate a candidate end effector motion vector defining motion to move the end effector from a current pose to an additional pose; identify a current image captured by the vision sensor, the current image capturing an object in the environment of the robot; generate output based on processing the candidate end effector motion vector and the current image using the one or more deep neural networks; determine, based on the output: that the object has the desired object semantic feature indicated by the user interface input; and that a measure of successful grasp, of the object with application of the motion defined by the candidate end effector motion vector, satisfies one or more criteria; responsive to determining that the object has the desired object semantic feature and that the measure of successful grasp satisfies the one or more criteria: providing an end effector command, that is based on the candidate end effector motion vector, to cause the one or more actuators to adjust the pose of the end effector. 10. The robot of claim 9 , wherein the user interface input device includes a microphone and wherein the user interface input is spoken input. 11. The robot of claim 10 , wherein the desired object semantic feature defines an object classification. 12. The robot of claim 9 , wherein the end effector command is a grasp command. 13. The robot of claim 9 , wherein, in the processing, the image is applied to first portion of the one or more deep neural networks and the candidate end effector motion vector is applied to a separate portion of the one or more deep neural networks. 14. The robot of claim 9 , wherein the end effector is an astrictive end effector. 15. The robot of claim 9 , wherein the end effector is an impactive end effector. 16. A system, comprising: memory storing a convolutional neural network; one or more processors executing instructions to: identify a plurality of training examples generated based on sensor output from one or more robots during a plurality of grasp attempts by the robots, each of the training examples including training example input comprising: an image for a corresponding instance of time of a corresponding grasp attempt of the grasp attempts, the image capturing a robotic end effector and one or more environmental objects at the corresponding instance of time, an end effector motion vector defining motion of the end effector to move from an instance of time pose of the end effector at the corresponding instance of time to a final pose of the end effector for the corresponding grasp attempt, and each of the training examples including training example output comprising: at least one grasped object label indicating a semantic feature of an object grasped by the corresponding grasp attempt; and train the convolutional neural network based on the training examples. 17. The system of claim 16 , wherein in executing the instructions t

Assignees

Inventors

Classifications

  • G06N3/045Primary

    Combinations of networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

  • Actuating means · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11045949B2 cover?
Deep machine learning methods and apparatus related to manipulation of an object by an end effector of a robot. Some implementations relate to training a semantic grasping model to predict a measure that indicates whether motion data for an end effector of a robot will result in a successful grasp of an object; and to predict an additional measure that indicates whether the object has desired s…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/045. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 29 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).