Shared dense network with robot task-specific heads

US11587302B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11587302-B2
Application numberUS-201916717498-A
CountryUS
Kind codeB2
Filing dateDec 17, 2019
Priority dateDec 17, 2019
Publication dateFeb 21, 2023
Grant dateFeb 21, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method includes receiving image data representing an environment of a robotic device from a camera on the robotic device. The method further includes applying a trained dense network to the image data to generate a set of feature values, where the trained dense network has been trained to accomplish a first robot vision task. The method additionally includes applying a trained task-specific head to the set of feature values to generate a task-specific output to accomplish a second robot vision task, where the trained task-specific head has been trained to accomplish the second robot vision task based on previously generated feature values from the trained dense network, where the second robot vision task is different from the first robot vision task. The method also includes controlling the robotic device to operate in the environment based on the task-specific output generated to accomplish the second robot vision task.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving image data representing an environment of a robotic device from a camera on the robotic device; applying a trained dense network to the image data to generate a set of feature values, wherein the trained dense network and a first trained task-specific head have been trained to accomplish a first robot vision task; applying a second trained task-specific head to the set of feature values to generate a task-specific output to accomplish a second robot vision task, wherein the second trained task-specific head has been trained to accomplish the second robot vision task based on previously generated feature values from the trained dense network, wherein the second trained task-specific head was trained to accomplish the second robot vision task after the trained dense network and the first trained task-specific head were trained to accomplish the first robot vision task, wherein each of the first robot vision task and the second robot vision task involves processing image data to acquire specific information needed to accomplish a different corresponding robot task, wherein each robot task involves a different type of physical manipulation of the environment by the robotic device; and controlling the robotic device to operate in the environment based on the task-specific output generated to accomplish the second robot vision task. 2. The method of claim 1 , wherein the trained dense network is a feature pyramid network (FPN). 3. The method of claim 1 , further comprising periodically retraining the first or second trained task-specific head without changing the trained dense network. 4. The method of claim 1 , wherein the trained dense network has more network layers than each of the first trained task-specific head and the second trained task-specific head. 5. The method of claim 1 , wherein the trained dense network has been trained using image data from one or more other robotic devices having a same or similar camera as the camera of the robotic device. 6. The method of claim 1 , wherein the first robot vision task involves determining whether an area is robotically manipulatable. 7. The method of claim 1 , wherein the first robot vision task involves determining whether a first type of robotic manipulation is performable on the environment and the second robot vision task involves determining whether a second type of robotic manipulation is performable on the environment. 8. The method of claim 7 , wherein the first type of robotic manipulation involves a first robotic manipulator and the second type of robotic manipulation involves a second robotic manipulator. 9. The method of claim 1 , wherein the first or second trained task-specific head is one of at least three trained task-specific heads corresponding to respective functions of detection, segmentation, and classification. 10. The method of claim 1 , wherein the second robot vision task for the second trained task-specific head involves determining whether an object is partially occluded by a portion of the robotic device. 11. The method of claim 1 , wherein the second robot vision task for the second trained task-specific head involves determining whether an object is in a gripper of the robotic device. 12. The method of claim 1 , wherein the first or second trained task-specific head is one of a plurality of trained task-specific heads corresponding to identifying a plurality of respective object types. 13. The method of claim 12 , wherein the plurality of respective object types comprise at least one object type that is robotically manipulatable to enable the robotic device to enter or exit an area in the environment. 14. The method of claim 13 , wherein the at least one object type is robotically manipulatable to open or close a door in the environment. 15. The method of claim 1 , wherein a control system of the robotic device comprises a plurality of task-specific heads, wherein the method further comprises periodically adjusting which of the plurality of task-specific heads are active. 16. The method of claim 1 , wherein the second trained task-specific head is applied to both the set of feature values and a different task-specific output from a different task-specific head. 17. The method of claim 1 , wherein the image data comprises red green blue depth (RGBD) data. 18. The method of claim 1 , wherein layers of the trained dense network are processed by a graphics processing unit (GPU) of the robotic device, and wherein layers of the first trained task-specific head or the second trained task-specific head are processed by a central processing unit (CPU) of the robotic device. 19. A robotic device comprising: a camera; and a control system configured to: receive image data representing an environment of the robotic device from the camera on the robotic device; receiving image data representing an environment of a robotic device from a camera on the robotic device; applying a trained dense network to the image data to generate a set of feature values, wherein the trained dense network and a first trained task-specific head have been trained to accomplish a first robot vision task; applying a second trained task-specific head to the set of feature values to generate a task-specific output to accomplish a second robot vision task, wherein the second trained task-specific head has been trained to accomplish the second robot vision task based on previously generated feature values from the trained dense network, wherein the second trained task-specific head was trained to accomplish the second robot vision task after the trained dense network and the first trained task-specific head were trained to accomplish the first robot vision task, wherein each of the first robot vision task and the second robot vision task involves processing image data to acquire specific information needed to accomplish a different corresponding robot task, wherein each robot task involves a different type of physical manipulation of the environment by the robotic device; and control the robotic device to operate in the environment based on the task-specific output generated to accomplish the second robot vision task. 20. A non-transitory computer-readable medium comprising program instructions executable by at least one processor to cause the at least one processor to perform operations comprising: receiving image data representing an environment of a robotic device from a camera on the robotic device; receiving image data representing an environment of a robotic device from a camera on the robotic device; applying a trained dense network to the image data to generate a set of feature values, wherein the trained dense network and a first trained task-specific head have been trained to accomplish a first robot vision task; applying a second trained task-specific head to the set of feature values to generate a task-specific output to accomplish a second robot vision task, wherein the second trained task-specific head has been trained to accomplish the second robot vision task based on previously generated feature values from the trained dense network, wherein the second trained task-specific head was trained to accomplish the second robot vision task after the trained dense network and the first trained task-specific head were trained to accomplish the first robot vision task, wherein each of the first robot vision task and the second robot vision task involves processing image data to acquire specific information needed to acc

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • using classification, e.g. of video objects · CPC title

  • Terrestrial scenes (scenes under surveillance with static cameras G06V20/52; scenes perceived from the exterior of a vehicle G06V20/56; scenes perceived from the interior of a vehicle G06V20/59) · CPC title

  • Validation; Performance evaluation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11587302B2 cover?
A method includes receiving image data representing an environment of a robotic device from a camera on the robotic device. The method further includes applying a trained dense network to the image data to generate a set of feature values, where the trained dense network has been trained to accomplish a first robot vision task. The method additionally includes applying a trained task-specific h…
Who is the assignee on this patent?
X Dev Llc
What technology area does this patent fall under?
Primary CPC classification G06V10/40. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 21 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).