Controlling position of robot by determining goal proposals by using neural networks

US11958529B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11958529-B2
Application numberUS-202016998941-A
CountryUS
Kind codeB2
Filing dateAug 20, 2020
Priority dateAug 20, 2020
Publication dateApr 16, 2024
Grant dateApr 16, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A framework for offline learning from a set of diverse and suboptimal demonstrations operates by selectively imitating local sequences from the dataset. At least one embodiment recovers performant policies from large manipulation datasets by decomposing the problem into a goal-conditioned imitation and a high-level goal selection mechanism.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor, comprising one or more circuits to: use a first neural network to determine a set of intermediate goal proposals based on a current position of a robot; select, based at least in part on a value function, an intermediate goal from the set of intermediate goal proposals; use a second neural network to determine a set of actions that, as a result of being performed by the robot, reposition the robot from the current position to the selected intermediate goal; and cause the robot to at least partially perform a task by at least performing the set of actions. 2. The processor of claim 1 , wherein the first neural network was trained using a set of demonstrations of task performance. 3. The processor of claim 2 , wherein the set of demonstrations are collected at least in part by having a group of humans direct the robot to successfully perform the task using a manual interface. 4. The processor of claim 1 , wherein the set of intermediate goal proposals comprises a set of robotic poses reachable in a set number of time increments from the current position of the robot. 5. The processor of claim 1 , wherein: the value function provides a score for each proposal in the set of intermediate goal proposals; and the selected intermediate goal is selected as a goal proposal based at least in part on a respective score of the selected intermediate goal. 6. The processor of claim 1 , wherein the value function is trained using Batch Constrained Q-Learning. 7. The processor of claim 1 , wherein the one or more circuits: determine that the robot has not completed performing the task; and as a result of determining that the task is not complete, determine a new intermediate goal. 8. The processor of claim 1 , wherein the value function is based at least in part on a distance in units of time that a goal proposal is from successful task completion. 9. A system, comprising: one or more processors coupled to computer-readable media, the computer-readable media storing executable instructions that, as a result of being executed by the one or more processors, cause the system to: cause a robot to perform a task by at least causing the robot to achieve a set of intermediate goals, the set of intermediate goals determined using a neural network that proposes the set of intermediate goals based on a current position of the robot. 10. The system of claim 9 , wherein the neural network is trained using a dataset of demonstrations of performances of the task. 11. The system of claim 9 , wherein an intermediate goal in the set of intermediate goals is selected from a plurality of goal proposals based at least in part on a value function. 12. The system of claim 11 wherein the value function is trained based at least in part on a temporal difference loss. 13. The system of claim 9 , wherein the executable instructions, as a result of being executed by the one or more processors, further cause the system to: cause the robot to achieve an intermediate goal in the set of intermediate goals; and as a result of achieving the intermediate goal, determine a next intermediate goal. 14. The system of claim 9 , wherein each intermediate goal in the set of intermediate goals identifies a pose for the robot. 15. The system of claim 9 , wherein the neural network is a variational autoencoder. 16. The system of claim 9 , wherein the executable instructions, as a result of being executed by the one or more processors, further cause the system to: use a recurrent neural network to determine a set of actions that, as a result of being performed by the robot, reposition the robot from the current position to an intermediate goal in the set of intermediate goals. 17. A machine-readable medium having stored thereon a set of instructions, which if performed by one or more processors, cause the one or more processors to at least: use a first neural network to determine a set of intermediate goal proposals based on a current position of a robot; select, based at least in part on a value function, an intermediate goal from the set of intermediate goal proposals; use a second neural network to determine a set of actions that, as a result of being performed by the robot, reposition the robot from the current position to the selected intermediate goal; and cause the robot to at least partially perform a task by at least performing the set of actions. 18. The machine-readable medium of claim 17 , wherein the first neural network was trained using a set of demonstrations of task performance. 19. The machine-readable medium of claim 17 , wherein: the first neural network is trained to model a distribution of states that are an amount of time from a given state; and each state describes a position and orientation of the robot. 20. The machine-readable medium of claim 18 , wherein the set of demonstrations is a set of observations of the robot performing the task. 21. The machine-readable medium of claim 17 , wherein the instructions further comprise instructions that, as a result of being executed by the one or more processors, cause the one or more processors to: generate a score for each intermediate goal proposal of the set of intermediate goal proposals using the value function; and select the selected intermediate goal based on the score. 22. The machine-readable medium of claim 17 , wherein the set of intermediate goal proposals is a set of robotic poses reachable in a set amount of time from the current position of the robot. 23. The machine-readable medium of claim 17 , wherein: the robot is an autonomous vehicle; and the task is a parking operation. 24. The machine-readable medium of claim 17 , wherein the instructions further comprise instructions that, as a result of being executed by the one or more processors, cause the one or more processors to: determine that the robot has not completed the task; and as a result of determining that the task is not complete, determine an additional intermediate goal.

Assignees

Inventors

Classifications

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

  • Supervised learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Generative networks · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11958529B2 cover?
A framework for offline learning from a set of diverse and suboptimal demonstrations operates by selectively imitating local sequences from the dataset. At least one embodiment recovers performant policies from large manipulation datasets by decomposing the problem into a goal-conditioned imitation and a high-level goal selection mechanism.
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification B62D15/0285. Mapped technology areas include Operations & Transport.
When was this patent published?
Publication date Tue Apr 16 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).