Deep machine learning methods and apparatus for robotic grasping

US9914213B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9914213-B2
Application numberUS-201715448013-A
CountryUS
Kind codeB2
Filing dateMar 2, 2017
Priority dateMar 3, 2016
Publication dateMar 13, 2018
Grant dateMar 13, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Deep machine learning methods and apparatus related to manipulation of an object by an end effector of a robot. Some implementations relate to training a semantic grasping model to predict a measure that indicates whether motion data for an end effector of a robot will result in a successful grasp of an object; and to predict an additional measure that indicates whether the object has desired semantic feature(s). Some implementations are directed to utilization of the trained semantic grasping model to servo a grasping end effector of a robot to achieve a successful grasp of an object having desired semantic feature(s).

First claim

Opening claim text (preview).

What is claimed is: 1. A method implemented by one or more processors, comprising: generating a candidate end effector motion vector defining motion to move a grasping end effector of a robot from a current pose to an additional pose; identifying a current image captured by a vision sensor associated with the robot, the current image capturing the grasping end effector and at least one object in an environment of the robot; applying the current image and the candidate end effector motion vector as input to a trained grasp convolutional neural network; generating, over the trained grasp convolutional neural network, a measure of successful grasp of the object with application of the motion, the measure being generated based on the application of the image and the end effector motion vector to the trained grasp convolutional neural network; identifying a desired object semantic feature; applying, as input to a semantic convolutional neural network, a spatial transformation of the current image or of an additional image captured by the vision sensor; generating, over the semantic convolutional neural network based on the spatial transformation, an additional measure that indicates whether the desired object semantic feature is present in the spatial transformation; generating an end effector command based on the measure of successful grasp and the additional measure that indicates whether the desired object semantic feature is present; and providing the end effector command to one or more actuators of the robot. 2. The method of claim 1 , further comprising: generating, over the trained grasp convolutional neural network based on the application of the image and the end effector motion vector to the trained grasp convolutional neural network, spatial transformation parameters; and generating the spatial transformation over a spatial transformation network based on the spatial transformation parameters. 3. The method of claim 1 , wherein the desired object semantic feature defines an object classification. 4. The method of claim 1 , further comprising: receiving user interface input from a user interface input device; wherein identifying the desired object semantic feature is based on the user interface input. 5. The method of claim 4 , wherein the user interface input device is a microphone of the robot. 6. The method of claim 1 , wherein the spatial transformation is of the current image. 7. The method of claim 6 , wherein the spatial transformation crops out a portion of the current image. 8. The method of claim 1 , further comprising: determining a current measure of successful grasp of the object without application of the motion; wherein generating the end effector command based on the measure comprises generating the end effector command based on comparison of the measure to the current measure. 9. The method of claim 8 , wherein the end effector command is a grasp command and wherein generating the grasp command is in response to: determining that the additional measure indicates that the desired object feature is present in the spatial transformation; and determining that comparison of the measure to the current measure satisfies one or more criteria. 10. The method of claim 1 , wherein the end effector command is an end effector motion command and wherein generating the end effector motion command comprises generating the end effector motion command to conform to the candidate end effector motion vector. 11. The method of claim 1 , wherein the end effector command is an end effector motion command and wherein generating the end effector motion command comprises generating the end effector motion command to effectuate a trajectory correction to the end effector. 12. The method of claim 1 , wherein the end effector command is an end effector motion command and conforms to the candidate end effector motion vector, wherein providing the end effector motion command to the one or more actuators moves the end effector to a new pose, and further comprising: generating, by one or more processors, an additional candidate end effector motion vector defining new motion to move the grasping end effector from the new pose to a further additional pose; identifying, by one or more of the processors, a new image captured by a vision sensor associated with the robot, the new image capturing the end effector at the new pose and capturing the objects in the environment; applying, by one or more of the processors, the new image and the additional candidate end effector motion vector as input to the trained grasp convolutional neural network; generating, over the trained grasp convolutional neural network, a new measure of successful grasp of the object with application of the new motion, the new measure being generated based on the application of the new image and the additional end effector motion vector to the trained grasp convolutional neural network; applying, as input to the semantic convolutional neural network, an additional spatial transformation of the new image or a new additional image captured by the vision sensor; generating, over the semantic convolutional neural network based on the additional spatial transformation, a new additional measure that indicates whether the desired object feature is present in the spatial transformation; generating a new end effector command based on the new measure of successful grasp and the new additional measure that indicates whether the desired object feature is present; and providing the new end effector command to one or more actuators of the robot. 13. The method of claim 1 , wherein applying the image and the candidate end effector motion vector as input to the trained grasp convolutional neural network comprises: applying the image as input to an initial layer of the trained grasp convolutional neural network; and applying the candidate end effector motion vector to an additional layer of the trained grasp convolutional neural network, the additional layer being downstream of the initial layer. 14. The method of claim 1 , wherein generating the candidate end effector motion vector comprises: generating a plurality of candidate end effector motion vectors; and performing one or more iterations of cross-entropy optimization on the plurality of candidate end effector motion vectors to select the candidate end effector motion vector from the plurality of candidate end effector motion vectors. 15. A method implemented by one or more processors, comprising: identifying a current image captured by a vision sensor associated with a robot; generating, over a grasp convolutional neural network based on application of the current image to the grasp convolutional neural network: a measure of successful grasp, by a grasping end effector of the robot, of an object captured in the current image, and spatial transformation parameters; generating, over a spatial transformer network, a spatial transformation based on the spatial transformation parameters, the spatial transformation being of the current image or an additional image captured by the vision sensor; applying the spatial transformation as input to a semantic convolutional neural network; generating, over the semantic convolutional neural network based on the spatial transformation, an additional measure that indicates whether a desired object semantic feature is present in the spatial transformation; generating an end effector command based on the measure and the additional measure; and providing the end effector command to one or more actuators of the robot. 1

Assignees

Inventors

Classifications

  • based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour · CPC title

  • G06N3/045Primary

    Combinations of networks · CPC title

  • G05B13/027Primary

    using neural networks only · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

  • Vision controlled systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9914213B2 cover?
Deep machine learning methods and apparatus related to manipulation of an object by an end effector of a robot. Some implementations relate to training a semantic grasping model to predict a measure that indicates whether motion data for an end effector of a robot will result in a successful grasp of an object; and to predict an additional measure that indicates whether the object has desired s…
Who is the assignee on this patent?
Google Inc, Google Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/045. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 13 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).