Fine-tuning policies to facilitate chaining

US12430564B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12430564-B2
Application numberUS-202217684245-A
CountryUS
Kind codeB2
Filing dateMar 1, 2022
Priority dateMar 1, 2022
Publication dateSep 30, 2025
Grant dateSep 30, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A manipulation task may include operations performed by one or more manipulation entities on one or more objects. This manipulation task may be broken down into a plurality of sequential sub-tasks (policies). These policies may be fine-tuned so that a terminal state distribution of a given policy matches an initial state distribution of another policy that immediately follows the given policy within the plurality of policies. The fine-tuned plurality of policies may then be chained together and implemented within a manipulation environment.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising, at a device: determining an initial state distribution of a second state-action policy, the initial state distribution including possible states of an environment immediately before the second state-action policy is implemented; fine-tuning a first state-action policy to match a terminal state distribution of the first state-action policy to the initial state distribution of the second state-action policy, the terminal state distribution including possible states of the environment immediately after the first state-action policy is implemented; implementing the fine-tuned first state-action policy and the second state-action policy in sequence, wherein a terminal state of the environment resulting from implementation of the first state-action policy is provided as an initial state of the environment to the second state-action policy. 2. The method of claim 1 , wherein the first state-action policy and the second state-action policy each describes one or more manipulations performed by one or more manipulation entities on one or more objects. 3. The method of claim 2 , wherein the one or more manipulation entities include one or more robotic manipulation devices. 4. The method of claim 2 , wherein the one or more manipulation entities include one or more vehicle manipulation devices. 5. The method of claim 2 , wherein the one or more objects include one or more components of a product being assembled. 6. The method of claim 2 , wherein the one or more objects include one or more components of a vehicle being controlled. 7. The method of claim 2 , wherein the initial state distribution of the second state-action policy identifies all possible states of the one or more manipulation entities and the one or more objects being manipulated immediately before the second state-action policy is implemented. 8. The method of claim 2 , wherein the terminal state distribution of the first state-action policy identifies all possible states of the one or more manipulation entities and the one or more objects being manipulated immediately after the first state-action policy is implemented. 9. The method of claim 1 , wherein the first state-action policy is adjusted so that the terminal state distribution of the first policy is within a predetermined threshold of the initial state distribution of the second state-action policy. 10. The method of claim 1 , wherein the fine-tuning is performed within a simulation of the environment. 11. The method of claim 1 , wherein the fine-tuning is performed within the environment. 12. The method of claim 1 , comprising implementing the chained policies within the environment. 13. A system comprising: a hardware processor of a device that is configured to: determine an initial state distribution of a second state-action policy, the initial state distribution including possible states of an environment immediately before the second state-action policy is implemented; fine-tune a first state-action policy to match a terminal state distribution of the first state-action policy to the initial state distribution of the second state-action policy, the terminal state distribution including possible states of the environment immediately after the first state-action policy is implemented; implement the fine-tuned first state-action policy and the second state-action policy in sequence, wherein a terminal state of the environment resulting from implementation of the first state-action policy is provided as an initial state of the environment to the second state-action policy. 14. The system of claim 13 , wherein the first state-action policy and the second state-action policy each describes one or more manipulations performed by one or more manipulation entities on one or more objects. 15. The system of claim 14 , wherein the one or more manipulation entities include one or more robotic manipulation devices. 16. The system of claim 14 , wherein the one or more manipulation entities include one or more vehicle manipulation devices. 17. The system of claim 14 , wherein the one or more objects include one or more components of a product being assembled. 18. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor of a device, causes the processor to cause the device to: determine an initial state distribution of a second state-action policy, the initial state distribution including possible states of an environment immediately before the second state-action policy is implemented; fine-tune a first state-action policy to match a terminal state distribution of the first state-action policy to the initial state distribution of the second state-action policy, the terminal state distribution including possible states of the environment immediately after the first state-action policy is implemented; implement the fine-tuned first state-action policy and the second state-action policy in sequence, wherein a terminal state of the environment resulting from implementation of the first state-action policy is provided as an initial state of the environment to the second state-action policy. 19. The non-transitory computer-readable medium of claim 18 , wherein the first state-action policy is adjusted so that the terminal state distribution of the first policy is within a predetermined threshold of the initial state distribution of the second state-action policy.

Assignees

Inventors

Classifications

  • characterised by modeling, simulation of the manufacturing system · CPC title

  • using automatic guided vehicles [AGV] (control of position or course of AGV's G05D1/00) · CPC title

  • characterised by job scheduling, process planning, material flow · CPC title

  • G06N3/092Primary

    Reinforcement learning · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12430564B2 cover?
A manipulation task may include operations performed by one or more manipulation entities on one or more objects. This manipulation task may be broken down into a plurality of sequential sub-tasks (policies). These policies may be fine-tuned so that a terminal state distribution of a given policy matches an initial state distribution of another policy that immediately follows the given policy w…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06N3/092. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 30 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).