Method and apparatus for constructing informative outcomes to guide multi-policy decision making

US12001934B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12001934-B2
Application numberUS-202318196897-A
CountryUS
Kind codeB2
Filing dateMay 12, 2023
Priority dateMar 17, 2017
Publication dateJun 4, 2024
Grant dateJun 4, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In Multi-Policy Decision-Making (MPDM), many computationally-expensive forward simulations are performed in order to predict the performance of a set of candidate policies. In risk-aware formulations of MPDM, only the worst outcomes affect the decision making process, and efficiently finding these influential outcomes becomes the core challenge. Recently, stochastic gradient optimization algorithms, using a heuristic function, were shown to be significantly superior to random sampling. In this disclosure, it was shown that accurate gradients can be computed—even through a complex forward simulation—using approaches similar to those in dep networks. The proposed approach finds influential outcomes more reliably, and is faster than earlier methods, allowing one to evaluate more policies while simultaneously eliminating the need to design an easily-differentiable heuristic function.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: for each policy of a set of policies: receiving a set of state estimates comprising: a state estimate for an agent and a state estimate for each of a set of objects; perturbing the set of state estimates to generate a set of perturbed state estimates; based on the policy, simulating movement of the set of objects and the agent with a set of simulations based on the set of perturbed state estimates; and determining a respective score for the policy based on an outcome quantified for the set of simulations; selecting a policy from the set of policies based on the respective score; and commanding the agent based on the selected policy. 2. The method of claim 1 , further comprising, for each policy: with a first set of simulations, simulating movement of the set of environmental objects and the controlled object based on the set of state estimates, wherein the perturbation of the set of state estimates is based on the first set of simulations. 3. The method of claim 2 , wherein the set of perturbed state estimates is associated with a set of influential outcomes of the first set of simulations, wherein each of the set of influential outcomes is associated with a set of cost metrics, wherein each of the set of cost metrics has a greater value than a cost metric associated with the set of state estimates. 4. The method of claim 3 , wherein the set of influential outcomes is determined with an anytime algorithm. 5. The method of claim 3 , wherein the set of influential outcomes is determined with a backpropagation process. 6. The method of claim 1 , wherein the set of simulations comprises a series of forward simulations. 7. The method of claim 6 , wherein the forward simulations are conducted iteratively over the set of perturbed state estimates based on a gradient computed iteratively over a series of timesteps. 8. The method of claim 1 , wherein the score is determined with a cost function which evaluates a Blame metric and a Progress metric, wherein the Progress metric is based on proximity to an agent goal point, wherein the Blame metric is based on agent proximity to objects of the set of object. 9. The method of claim 8 , wherein the cost function comprises a linear combination of the Blame metric and the Progress metric. 10. The method of claim 8 , wherein the Blame metric is determined as a function of a velocity of the agent and a distance between the agent and an object of the set of objects. 11. The method of claim 8 , wherein the score is further determined based on a probability of the perturbed state estimates. 12. The method of claim 1 , wherein the respective score for each policy is determined based on a distance between the agent and a closest object of the set of objects. 13. The method of claim 1 , wherein the set of simulations comprises repeating a simulation until a predetermined condition is satisfied. 14. The method of claim 13 wherein the predetermined condition is policy-specific. 15. The method of claim 1 , wherein the respective score is determined based on multiple outcomes quantified for the set of simulations. 16. The method of claim 1 , wherein the agent comprises an autonomous vehicle. 17. The method of claim 1 , wherein the set of policies comprises a plurality of policies. 18. The method of claim 17 , wherein set of simulations and policy selection are executed in real-time relative to receipt of the set of state estimates. 19. The method of claim 1 , wherein the set of objects comprises a plurality of objects in an environment of the agent. 20. The method of claim 1 , wherein commanding the agent comprises commanding the agent to traverse through the environment according to the policy.

Assignees

Inventors

Classifications

  • G06N3/02Primary

    Neural networks · CPC title

  • G06N3/008Primary

    based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Diagnosis, testing or measuring; Detecting, analysing or monitoring not otherwise provided for (error detection, error correction or monitoring in digital computers or digital computer components G06F11/00) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12001934B2 cover?
In Multi-Policy Decision-Making (MPDM), many computationally-expensive forward simulations are performed in order to predict the performance of a set of candidate policies. In risk-aware formulations of MPDM, only the worst outcomes affect the decision making process, and efficiently finding these influential outcomes becomes the core challenge. Recently, stochastic gradient optimization algori…
Who is the assignee on this patent?
Univ Michigan Regents
What technology area does this patent fall under?
Primary CPC classification G06N3/02. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 04 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).