Method and system of deep supervision object detection for reducing resource usage
US-2021365716-A1 · Nov 25, 2021 · US
US11449079B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11449079-B2 |
| Application number | US-201916262448-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 30, 2019 |
| Priority date | Jan 30, 2019 |
| Publication date | Sep 20, 2022 |
| Grant date | Sep 20, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and techniques are described that provide for generalizable approach policy learning and implementation for robotic object approaching. Described techniques provide fast and accurate approaching of a specified object, or type of object, in many different environments. The described techniques enable a robot to receive an identification of an object or type of object from a user, and then navigate to the desired object, without further control from the user. Moreover, the approach of the robot to the desired object is performed efficiently, e.g., with a minimum number of movements. Further, the approach techniques may be used even when the robot is placed in a new environment, such as when the same type of object must be approached in multiple settings.
Opening claim text (preview).
What is claimed is: 1. A computer program product, the computer program product being tangibly embodied on a non-transitory computer-readable storage medium and comprising instructions that, when executed by at least one computing device, are configured to cause the at least one computing device to: receive an approach request identifying a target object within an environment, wherein the approach request indicates that the target object is to be approached using a movement system of a robot; obtain an image of the target object within the environment of the robot; determine, from the image, a semantic segmentation in which image pixels of the image corresponding to the target object are labeled with a semantic label corresponding to the target object; determine a depth map in which the image pixels of the image with the semantic label corresponding to the target object are associated with a distance of the target object from the robot; generate an attention mask based on the semantic segmentation and the target object identified in the approach request; select a movement action based on the attention mask and on the depth map using a navigation policy generator, wherein the navigation policy generator is trained using reinforcement learning; and execute the movement action to move the robot toward the target object within the environment. 2. The computer program product of claim 1 , wherein the instructions, when executed, are further configured to cause the at least one computing device to: train at least one convolutional neural network to generate the semantic segmentation and the depth map using a single encoder structure, with a first decoder branch trained to generate the semantic segmentation using the encoder structure and a second decoder branch trained to generate the depth map using the encoder structure. 3. The computer program product of claim 1 , wherein the instructions, when executed to select the movement action, are further configured to cause the at least one computing device to: train the navigation policy generator to use at least one additional convolutional neural network to utilize ground truth semantic information and depth information to predict at least one movement action from among a plurality of available movement actions available to the robot within the environment. 4. The computer program product of claim 3 , wherein the instructions, when executed to select the movement action, are further configured to cause the at least one computing device to: use the trained navigation policy generator to represent a state of the robot, relative to the target object, using the attention mask and the depth map, wherein the movement action is selected based on the state of the robot. 5. The computer program product of claim 4 , wherein the instructions, when executed to select the movement action, are further configured to cause the at least one computing device to: execute the trained navigation policy generator to generate a probabilistic distribution of movement actions from among the plurality of available movement actions, based on the state of the robot; and select the movement action from the probabilistic distribution of movement actions. 6. The computer program product of claim 5 , wherein the instructions, when executed, are further configured to cause the at least one computing device to: obtain a second image of the target object from a camera of the robot, following execution of the movement action; determine a second semantic segmentation and a second depth map, using at least one convolutional neural network; select, using the trained navigation policy generator, a second movement action, based on the second semantic segmentation, the second depth map, and the state of the robot following the movement action; and execute the second movement action to move the robot toward the target object within the environment. 7. The computer program product of claim 1 , wherein the instructions, when executed, are further configured to cause the at least one computing device to: obtain a second image of the target object from a camera of the robot, following execution of the movement action; determine a second semantic segmentation and a second depth map, using at least one convolutional neural network and the second image; select a second movement action, based on the second semantic segmentation, the second depth map, and a state of the robot following the movement action, relative to a preceding state of the robot prior to the movement action; and execute the second movement action to move the robot toward the target object within the environment. 8. The computer program product of claim 7 , wherein the instructions, when executed, are further configured to cause the at least one computing device to: continue further iterations of obtaining a current image of the target object following a preceding movement action, determining a current semantic segmentation, current depth map, and current state, based on the current image, selecting a current movement action, based on the current semantic segmentation, the current depth map, and the current state, executing the current movement action, and evaluating whether the current movement action achieves a success condition of an approach request; and complete the approach request when the evaluation indicates the success condition has been reached. 9. The computer program product of claim 1 , wherein the attention mask comprises a 2-dimensional matrix whose values indicate a focus on the target object. 10. A robot comprising: a movement system configured to receive an approach request identifying a target object within an environment, wherein the approach request includes an instruction to move the robot and approach the target object; a camera configured to capture an image of the target object within the environment; and a control system configured to fulfill the approach request including executing iterations of moving the robot towards the target object through a plurality of iterative movements until an approach success condition is reached, the iterations including determining, from a first image from the camera and using at least one convolutional neural network, a first semantic segmentation in which image pixels of the image corresponding to the target object are labeled with a semantic label corresponding to the target object in the approach request, a first depth map, and a first state of the robot, relative to the target object; generate a first attention mask based on the first semantic segmentation and the target object identified in the approach request; determining a first movement action of the robot, using the first attention mask, the first depth map, and the first state, wherein the first movement action is selected using a navigation policy generator, and wherein the navigation policy generator and the at least one convolutional neural network are trained using reinforcement learning; executing the first movement action of the robot toward the target object within the environment; determining a second semantic segmentation, a second depth map, and a second state of the robot, relative to the target object, using a second image from the camera; generate a second attention mask based on the second semantic segmentation and the target object identified in the approach request; determining a second movement action of the robot, using the second attention mask, the second depth map, and the second state, relative to the first state; and executing the second movement action of the robot toward the target object within the environment. 11. The robot of claim 10 ,
Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title
Terrestrial scenes (scenes under surveillance with static cameras G06V20/52; scenes perceived from the exterior of a vehicle G06V20/56; scenes perceived from the interior of a vehicle G06V20/59) · CPC title
using neural networks · CPC title
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
using classification, e.g. of video objects · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.