Machine learning methods and apparatus for semantic robotic grasping

US11717959B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11717959-B2
Application numberUS-201816622309-A
CountryUS
Kind codeB2
Filing dateJun 28, 2018
Priority dateJun 28, 2017
Publication dateAug 8, 2023
Grant dateAug 8, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Deep machine learning methods and apparatus related to semantic robotic grasping are provided. Some implementations relate to training a training a grasp neural network, a semantic neural network, and a joint neural network of a semantic grasping model. In some of those implementations, the joint network is a deep neural network and can be trained based on both: grasp losses generated based on grasp predictions generated over a grasp neural network, and semantic losses generated based on semantic predictions generated over the semantic neural network. Some implementations are directed to utilization of the trained semantic grasping model to servo, or control, a grasping end effector of a robot to achieve a successful grasp of an object having desired semantic feature(s).

First claim

Opening claim text (preview).

What is claimed is: 1. A method implemented by one or more processors, comprising: identifying a desired object semantic feature; generating a candidate end effector motion vector defining motion to move a grasping end effector of a robot from a given pose to an additional pose; identifying an image captured by a vision component of the robot, the image capturing the grasping end effector and an object in an environment of the robot; applying the image and the candidate end effector motion vector as input to a trained joint neural network; generating a joint output based on the application of the image and the end effector motion vector to the trained joint neural network, wherein the trained joint neural network is trained based on: grasp losses generated based on grasp predictions generated over a grasp neural network based on training outputs generated using the joint neural network, and semantic losses generated based on semantic predictions generated over a semantic neural network based on training outputs generated using the joint neural network; applying the joint output to a trained version of the semantic neural network; generating, using the trained version of the semantic neural network based on the joint output, semantic neural network output that indicates whether the object includes the desired object semantic feature; generating a grasp success measure, generating the grasp success measure comprising: generating the grasp success measure based on application of the joint output to a trained version of the grasp neural network, or generating the grasp success measure based on application of the current image and the end effector motion vector to an additional trained grasp neural network; generating an end effector command based on the grasp success measure and the semantic model output that indicates whether the object includes the desired object semantic feature; and providing the end effector command to one or more actuators of the robot. 2. The method of claim 1 , wherein generating the grasp success measure comprises generating the grasp success measure based on application of the joint output to the trained version of the grasp neural network. 3. The method of claim 1 , wherein generating the grasp success measure comprises generating the grasp success measure based on application of the image and the end effector motion vector to the additional trained grasp neural network. 4. The method of claim 3 , wherein the additional trained grasp neural network is trained independently of the grasp neural network, the joint neural network, and the semantic neural network. 5. The method of claim 1 , wherein the image is not applied directly as input to the semantic neural network in generating the semantic model output. 6. The method of claim 1 , wherein the joint output is the only input applied to the semantic neural network in generating the semantic model output. 7. The method of claim 1 , wherein in training the joint neural network based on the grasp losses generated based on the grasp predictions generated over the grasp neural network, the grasp neural network is also trained based on the grasp losses, without training of the semantic neural network based on the grasp losses. 8. The method of claim 1 , wherein in training the joint neural network based on the semantic losses generated based on the semantic predictions generated over the semantic neural network, the semantic neural network is also trained based on the semantic losses, without training of the grasp neural network based on the semantic losses. 9. The method of claim 1 , wherein the desired object semantic feature defines an object classification. 10. The method of claim 1 , wherein the semantic model output indicates, for each of a plurality of object classifications, a likelihood that the object has a corresponding one of the object classifications. 11. The method of claim 1 , further comprising: receiving user interface input from a user interface input device; wherein identifying the desired object semantic feature is based on the user interface input. 12. The method of claim 1 , further comprising: determining a current grasp success measure of the object without application of the motion; wherein generating the end effector command based on the grasp success measure comprises generating the end effector command based on comparison of the grasp success measure to the current grasp success measure. 13. The method of claim 1 , wherein the end effector command is an end effector motion command and wherein generating the end effector motion command comprises generating the end effector motion command to conform to the candidate end effector motion vector. 14. The method of claim 13 , wherein generating the end effector command is in response to: determining, based on the semantic neural network output, a likelihood that the object includes the desired object feature; and determining that the likelihood satisfies one or more criteria and that the grasp success measure satisfies one or more criteria. 15. The method of claim 13 , wherein generating the end effector command is in response to: determining, based on the semantic neural network output, a likelihood that the object includes the desired object feature; generating a value as a function of the likelihood and the grasp success measure; and determining that the value satisfies a threshold. 16. A method implemented by one or more processors, comprising: identifying a desired object semantic feature; generating a candidate end effector motion vector defining motion to move a grasping end effector of a robot from a given pose to an additional pose; identifying an image captured by a vision component of the robot, the image capturing the grasping end effector and an object in an environment of the robot; applying the image and the candidate end effector motion vector as input to a trained joint neural network; generating a joint output based on the application of the image and the end effector motion vector to the trained joint neural network; applying the joint output to a trained semantic neural network; generating, using the trained semantic neural network based on the joint output, semantic neural network output that indicates whether the object includes the desired object semantic feature; applying the joint output to a trained grasp neural network; generating, using the trained grasp neural network based on the joint output, a grasp success measure; generating an end effector command based on the grasp success measure and the semantic model output that indicates whether the object includes the desired object semantic feature; and providing the end effector command to one or more actuators of the robot. 17. The method of claim 16 , wherein during training of the trained grasp neural network, grasp losses generated based on grasp predictions generated over the grasp neural network are utilized to update the grasp neural network and the joint prediction model, without being utilized to update the semantic neural network. 18. The method of claim 16 , wherein during training of the trained semantic neural network, grasp losses generated based on semantic predictions generated over the semantic neural network are utilized to update the semantic neural network and the joint prediction model, without being utilized to update the grasp neural network.

Assignees

Inventors

Classifications

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

  • Supervised learning · CPC title

  • Transfer learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • B25J9/163Primary

    learning, adaptive, model based, rule based expert control · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11717959B2 cover?
Deep machine learning methods and apparatus related to semantic robotic grasping are provided. Some implementations relate to training a training a grasp neural network, a semantic neural network, and a joint neural network of a semantic grasping model. In some of those implementations, the joint network is a deep neural network and can be trained based on both: grasp losses generated based on …
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification B25J9/163. Mapped technology areas include Operations & Transport.
When was this patent published?
Publication date Tue Aug 08 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).