Guided uncertainty-aware policy optimization: combining model-free and model-based strategies for sample-efficient learning

US12109701B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12109701-B2
Application numberUS-202016780465-A
CountryUS
Kind codeB2
Filing dateFeb 3, 2020
Priority dateNov 20, 2019
Publication dateOct 8, 2024
Grant dateOct 8, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A robot is controlled using a combination of model-based and model-free control methods. In some examples, the model-based method uses a physical model of the environment around the robot to guide the robot. The physical model is oriented using a perception system such as a camera. Characteristics of the perception system may be are used to determine an uncertainty for the model. Based at least in part on this uncertainty, the system transitions from the model-based method to a model-free method where, in some embodiments, information provided directly from the perception system is used to direct the robot without reliance on the physical model.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: dividing at least a portion of a physical model created based at least in part on information from a perception system into a plurality of regions; generating estimates of uncertainty for the plurality of regions based at least in part on at least one uncertainty estimation provided by the perception system; using the physical model to control a robot in any of the plurality of regions associated with any of the estimates of uncertainty that indicate the robot is unlikely to interact with its environment; and using at least one reinforcement learning process instead of using the physical model to control the robot in any of the plurality of regions associated with any of the estimates of uncertainty that indicate the robot is likely to interact with its environment. 2. The computer-implemented method of claim 1 , wherein: the perception system is a stationary camera; and the at least one reinforcement learning process controls the robot using data collected by a camera mounted on the robot. 3. The computer-implemented method of claim 1 , wherein the plurality of regions comprises a first set of regions that are associated with any of the estimates of uncertainty that indicate the robot is unlikely to interact with its environment and a second set of regions that are associated with any of the estimates of uncertainty that indicate the robot is likely to interact with its environment, and the method further comprises: using the physical model to move the robot within at least one region in the first set of regions to at least one region in the second set of regions; and using the at least one reinforcement learning process to control the robot to complete a task in one or more of the second set of regions. 4. The computer-implemented method of claim 1 , further comprising: generating the physical model based at least in part on image data collected by at least one camera. 5. The computer-implemented method of claim 1 , wherein the at least one uncertainty estimation provided by the perception system comprises a nonparametric distribution of a plurality of poses of a region of the plurality of regions and an associated weights for each of the plurality of poses. 6. The computer-implemented method of claim 1 , wherein the at least one uncertainty estimation provided by the perception system comprises a parametric distribution. 7. The computer-implemented method of claim 1 , wherein using the physical model to control the robot comprises moving the robot using a controller that uses target attractors defined by motion policies of the robot. 8. The computer-implemented method of claim 1 , wherein the reinforcement learning process is performed using an autoencoder that is trained using input from at least one camera mounted on the robot instead of relying on the physical model. 9. A computer system comprising: one or more processors; and computer-readable memory storing executable instructions that, as a result of being executed by the one or more processors, cause the computer system to: divide at least a portion of a physical model created based at least in part on information from a perception system into a plurality of regions; generate estimates of uncertainty for the plurality of regions based at least in part on at least one uncertainty estimation provided by the perception system, a first set of the plurality of regions to comprise any of the plurality of regions associated with any of the estimates of uncertainty that indicate a robot is unlikely to interact with its environment, a second set of the plurality of regions to comprise any of the plurality of regions associated with any of the estimates of uncertainty that indicate the robot is likely to interact with its environment; determine if the robot is positioned in at least one of the first set of regions or at least one of the second set of regions; use the physical model to control the robot if it is determined that the robot is in at least one of the first set of regions; and use at least one reinforcement learning process instead of using the physical model to control the robot if it is determined that the robot is in at least one of the second set of regions. 10. The computer system of claim 9 , wherein the at least one reinforcement learning process is to control the robot using data collected by a wrist-mounted camera positioned on the robot. 11. The computer system of claim 9 , wherein the instructions, as a result of being executed by the one or more processors, cause the computer system to updates the second set of regions using a result of controlling the robot in at least a portion of the plurality of regions. 12. The computer system of claim 11 , wherein the instructions, as a result of being executed by the one or more processors, cause the computer system to: perform a task using the robot, and use a result of the task to modify at least one of the first or second sets of regions. 13. The computer system of claim 9 , wherein the instructions, as a result of being executed by the one or more processors, cause the computer system to orient the physical model by at least using a deep object pose estimator to process data obtained by the perception system. 14. The computer system of claim 9 , wherein: the perception system comprises a first camera; and the at least one reinforcement learning process controls the robot using data collected by a second camera; and the first and the second cameras are different cameras. 15. The computer system of claim 9 , wherein the perception system comprises a camera; and the instructions, as a result of being executed by the one or more processors, cause the computer system to use image data collected by the camera to generate a plurality of possible poses that are consistent with the image data collected by the camera. 16. The computer system of claim 9 , wherein the instructions, as a result of being executed by the one or more processors, cause the computer system to cause a controller that uses target attractors defined by motion policies of the robot to move the robot within the first set of regions. 17. A non-transitory computer-readable medium having stored thereon instructions that, which if performed by one or more processors, cause the one or more processors to at least: partition at least a portion of a physical model into a plurality of regions; generate estimates of uncertainty for the plurality of regions; identifying a first set of the plurality of regions based at least in part on any of the estimates of uncertainty associated with the first set of regions; identifying a second set of the plurality of regions based at least in part on any of the estimates of uncertainty associated with the second set of regions; use the physical model to control a robot in any of the first set of regions; and use at least one reinforcement learning process instead of using the physical model to control the robot in any of the second set of regions. 18. The non-transitory computer-readable medium of claim 17 , wherein: the physical model is to be created based at least in part on information obtained from a perception system comprising at least one stationary camera; and the at least one reinforcement learning process controls the robot using data collected by at least one camera that moves with the robot. 19. The non-transitory computer-readable medium of claim 17 , wherein the instructions, if performed by the one or more proc

Assignees

Inventors

Classifications

  • Reinforcement learning · CPC title

  • Feedforward networks · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Supervised learning · CPC title

  • involving the use of models or simulators · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12109701B2 cover?
A robot is controlled using a combination of model-based and model-free control methods. In some examples, the model-based method uses a physical model of the environment around the robot to guide the robot. The physical model is oriented using a perception system such as a camera. Characteristics of the perception system may be are used to determine an uncertainty for the model. Based at least…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification B25J9/163. Mapped technology areas include Operations & Transport.
When was this patent published?
Publication date Tue Oct 08 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).