Shared dense network with robot task-specific heads

US11945106B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11945106-B2
Application numberUS-202318158072-A
CountryUS
Kind codeB2
Filing dateJan 23, 2023
Priority dateDec 17, 2019
Publication dateApr 2, 2024
Grant dateApr 2, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method includes receiving image data representing an environment of a robotic device from a camera on the robotic device. The method further includes applying a trained dense network to the image data to generate a set of feature values, where the trained dense network has been trained to accomplish a first robot vision task. The method additionally includes applying a trained task-specific head to the set of feature values to generate a task-specific output to accomplish a second robot vision task, where the trained task-specific head has been trained to accomplish the second robot vision task based on previously generated feature values from the trained dense network, where the second robot vision task is different from the first robot vision task. The method also includes controlling the robotic device to operate in the environment based on the task-specific output generated to accomplish the second robot vision task.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving a trained dense network and a first trained task-specific head, wherein the trained dense network has been trained to generate feature values based on image data, and wherein the trained dense network and a first trained task-specific head have been trained to accomplish a first robot vision task; receiving first image data representing an environment of a robotic device from a camera on the robotic device; training a second task-specific head based on the first image data to generate a task-specific output to accomplish a second robot vision task, wherein the second task-specific head is trained to accomplish the second robot vision task based on feature values generated by the trained dense network, wherein the second trained task-specific head is trained to accomplish the second robot vision task after the trained dense network and the first trained task-specific head were trained to accomplish the first robot vision task, wherein each of the first robot vision task and the second robot vision task involves processing image data to acquire specific information needed to accomplish a different corresponding robot task, wherein each robot task involves a different type of physical manipulation of the environment by the robotic device; and outputting the trained second task-specific head. 2. The method of claim 1 , wherein the trained dense network is a feature pyramid network (FPN). 3. The method of claim 1 , further comprising periodically retraining the first or second trained task-specific head without changing the trained dense network. 4. The method of claim 1 , wherein the trained dense network has more network layers than each of the first trained task-specific head and the second trained task-specific head. 5. The method of claim 1 , wherein the trained dense network has been trained using image data from one or more other robotic devices having a same or similar camera as the camera of the robotic device. 6. The method of claim 1 , wherein the first robot vision task involves determining whether an area is robotically manipulatable. 7. The method of claim 1 , wherein the first robot vision task involves determining whether a first type of robotic manipulation is performable on the environment and the second robot vision task involves determining whether a second type of robotic manipulation is performable on the environment. 8. The method of claim 7 , wherein the first type of robotic manipulation involves a first robotic manipulator and the second type of robotic manipulation involves a second robotic manipulator. 9. The method of claim 1 , wherein the first or second trained task-specific head is one of at least three trained task-specific heads corresponding to respective functions of detection, segmentation, and classification. 10. The method of claim 1 , wherein the second robot vision task for the second trained task-specific head involves determining whether an object is partially occluded by a portion of the robotic device. 11. The method of claim 1 , wherein the second robot vision task for the second trained task-specific head involves determining whether an object is in a gripper of the robotic device. 12. The method of claim 1 , wherein the first or second trained task-specific head is one of a plurality of trained task-specific heads corresponding to identifying a plurality of respective object types. 13. The method of claim 12 , wherein the plurality of respective object types comprise at least one object type that is robotically manipulatable to enable the robotic device to enter or exit an area in the environment. 14. The method of claim 13 , wherein the at least one object type is robotically manipulatable to open or close a door in the environment. 15. The method of claim 1 , wherein a control system of the robotic device comprises a plurality of task-specific heads, wherein the method further comprises periodically adjusting which of the plurality of task-specific heads are active. 16. The method of claim 1 , wherein the second trained task-specific head is applied to both the set of feature values and a different task-specific output from a different task-specific head. 17. The method of claim 1 , wherein the image data comprises red green blue depth (RGBD) data. 18. The method of claim 1 , wherein layers of the trained dense network are processed by a graphics processing unit (GPU) of the robotic device, and wherein layers of the first trained task-specific head or the second trained task-specific head are processed by a central processing unit (CPU) of the robotic device. 19. A robotic device comprising: a camera; and a control system configured to: receive a trained dense network and a first trained task-specific head, wherein the trained dense network has been trained to generate feature values based on image data, and wherein the trained dense network and a first trained task-specific head have been trained to accomplish a first robot vision task; receive first image data representing an environment of the robotic device from the camera on the robotic device; train a second task-specific head based on the first image data to generate a task-specific output to accomplish a second robot vision task, wherein the second task-specific head is trained to accomplish the second robot vision task based on feature values generated by the trained dense network, wherein the second trained task-specific head is trained to accomplish the second robot vision task after the trained dense network and the first trained task-specific head were trained to accomplish the first robot vision task, wherein each of the first robot vision task and the second robot vision task involves processing image data to acquire specific information needed to accomplish a different corresponding robot task, wherein each robot task involves a different type of physical manipulation of the environment by the robotic device; and apply the trained dense network and the trained second task-specific head to subsequently captured image data to facilitate the robotic device performing the second robot vision task. 20. A non-transitory computer-readable medium comprising program instructions executable by at least one processor to cause the at least one processor to perform operations comprising: receiving a trained dense network and a first trained task-specific head, wherein the trained dense network has been trained to generate feature values based on image data, and wherein the trained dense network and a first trained task-specific head have been trained to accomplish a first robot vision task; receiving first image data representing an environment of a robotic device from a camera on the robotic device; training a second task-specific head based on the first image data to generate a task-specific output to accomplish a second robot vision task, wherein the second task-specific head is trained to accomplish the second robot vision task based on feature values generated by the trained dense network, wherein the second trained task-specific head is trained to accomplish the second robot vision task after the trained dense network and the first trained task-specific head were trained to accomplish the first robot vision task, wherein each of the first robot vision task and the second robot vision task involves processing image data to acquire specific information needed to accomplish a different corresponding robot task, wherein each robot task involves a different type of ph

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • B25J5/007Primary

    mounted on wheels · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Extraction of image or video features · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11945106B2 cover?
A method includes receiving image data representing an environment of a robotic device from a camera on the robotic device. The method further includes applying a trained dense network to the image data to generate a set of feature values, where the trained dense network has been trained to accomplish a first robot vision task. The method additionally includes applying a trained task-specific h…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification B25J5/007. Mapped technology areas include Operations & Transport.
When was this patent published?
Publication date Tue Apr 02 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).