Method and apparatus for constructing informative outcomes to guide multi-policy decision making

US12299554B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12299554-B2
Application numberUS-202418653211-A
CountryUS
Kind codeB2
Filing dateMay 2, 2024
Priority dateMar 17, 2017
Publication dateMay 13, 2025
Grant dateMay 13, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In Multi-Policy Decision-Making (MPDM), many computationally-expensive forward simulations are performed in order to predict the performance of a set of candidate policies. In risk-aware formulations of MPDM, only the worst outcomes affect the decision making process, and efficiently finding these influential outcomes becomes the core challenge. Recently, stochastic gradient optimization algorithms, using a heuristic function, were shown to be significantly superior to random sampling. In this disclosure, it was shown that accurate gradients can be computed-even through a complex forward simulation—using approaches similar to those in dep networks. The proposed approach finds influential outcomes more reliably, and is faster than earlier methods, allowing one to evaluate more policies while simultaneously eliminating the need to design an easily-differentiable heuristic function.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: operating a vehicle according to a first policy; while operating the vehicle according to the first policy, evaluating a set of policy options, comprising: detecting a set of objects in the vehicle's environment; evaluating each of the set of policy options, comprising, for each of the set of policy options: identifying a set of multiple potential outcomes associated with the policy, comprising guiding the set of multiple potential outcomes to a particular outcome category; evaluating the set of multiple potential outcomes to produce a score; selecting a second policy from the set of policy options based on a set of scores comprising the produced score for each policy option; and operating the vehicle according to the second policy; wherein evaluating the set of multiple potential outcomes comprises, for each of the set of policy options, predicting a progress of the vehicle toward a predetermined goal and each of the set of policy options is further evaluated based on a quantified risk of executing the policy option. 2. The method of claim 1 , wherein the particular outcome category comprises a category associated with a high risk of collision between one or more of: the vehicle and at least one object of the set of objects; or a first object and a second object of the set of objects. 3. The method of claim 2 , wherein guiding the set of multiple potential outcomes to the particular outcome category comprises adjusting input data associated with the set of objects. 4. The method of claim 3 , wherein the input data comprises at least one of position or motion information associated with the set of objects. 5. The method of claim 3 , wherein the input data comprises a goal associated with the set of objects. 6. The method of claim 3 , wherein guiding the set of multiple potential outcomes further comprises applying a backpropagation process. 7. The method of claim 1 , wherein evaluating the set of multiple potential outcomes comprises, for each of the set of policy options, predicting a quantified inconvenience to the set of objects in response to executing the policy option, and producing the score based on the quantified inconvenience. 8. The method of claim 7 , wherein the quantified inconvenience is calculated based on a set of predicted distances between the vehicle and a closest object of the set of objects. 9. The method of claim 7 , wherein each of the set of policy options is further evaluated based on a quantified risk of executing the policy option. 10. A system, comprising: a processing subsystem of a controlled vehicle configured to: evaluate a set of policy options, comprising: detecting a set of objects in the controlled vehicle's environment; evaluating each of the set of policy options, comprising, for each of the set of policy options: identifying a set of multiple potential outcomes associated with the policy, comprising guiding the set of multiple potential outcomes to a particular outcome category; evaluating the set of multiple potential outcomes to produce a score; selecting a policy from the set of policy options based on a set of scores comprising the produced score for each policy option; and a control subsystem configured to implement the selected policy; wherein evaluating the set of multiple potential outcomes comprises, for each of the set of policy options, predicting a quantified inconvenience to the set of objects in response to executing the policy option, and producing the score based on the quantified inconvenience. 11. The system of claim 10 , wherein the particular outcome category comprises a category associated with a high risk of collision between one or more of: the controlled vehicle and at least one object of the set of objects; or a first object and a second object of the set of objects. 12. The system of claim 11 , wherein guiding the set of multiple potential outcomes to the particular outcome category comprises adjusting input data associated with the set of objects. 13. The system of claim 12 , wherein the input data comprises at least one of position or motion information associated with the set of objects. 14. The system of claim 12 , wherein the input data comprises a goal associated with the set of objects. 15. The system of claim 10 , wherein the quantified inconvenience is calculated based on a set of predicted distances between the controlled vehicle and a closest object of the set of objects. 16. The system of claim 10 , wherein each of the set of policy options is further evaluated based on a quantified risk of executing the policy option. 17. The system of claim 10 , wherein evaluating the set of multiple potential outcomes comprises, for each of the set of policy options, predicting a progress of the vehicle toward a predetermined goal. 18. The system of claim 10 , wherein guiding the set of multiple potential outcomes includes applying a backpropagation process. 19. The system of claim 10 wherein evaluating the set of multiple potential outcomes comprises, for each of the set of policy options, predicting a progress of the vehicle toward a predetermined goal and each of the set of policy options is further evaluated based on a quantified risk of executing the policy option. 20. A method, comprising: operating a vehicle according to a first policy; while operating the vehicle according to the first policy, evaluating a set of policy options, comprising: detecting a set of objects in the vehicle's environment; evaluating each of the set of policy options, comprising, for each of the set of policy options: identifying a set of multiple potential outcomes associated with the policy, comprising guiding the set of multiple potential outcomes to a particular outcome category; evaluating the set of multiple potential outcomes to produce a score; selecting a second policy from the set of policy options based on a set of scores comprising the produced score for each policy option; and operating the vehicle according to the second policy; wherein the particular outcome category comprises a category associated with a high risk of collision between one or more of: the vehicle and at least one object of the set of objects; or a first object and a second object of the set of objects; and; wherein guiding the set of multiple potential outcomes to the particular outcome category comprises adjusting input data associated with the set of objects and applying a backpropagation process. 21. A method, comprising: operating a vehicle according to a first policy; while operating the vehicle according to the first policy, evaluating a set of policy options, comprising: detecting a set of objects in the vehicle's environment; evaluating each of the set of policy options, comprising, for each of the set of policy options: identifying a set of multiple potential outcomes associated with the policy, comprising guiding the set of multiple potential outcomes to a particular outcome category; evaluating the set of multiple potential outcomes to produce a score; selecting a second policy from the set of policy options based on a set of scores comprising the produced score for each policy option; and operating the vehicle according to the second policy; wherein evaluating the set of multiple potential outcomes comprises, for each of the set of policy options, predicting a quantified inconvenience to the set of objects in response to executing the policy option, and produ

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Diagnosis, testing or measuring; Detecting, analysing or monitoring not otherwise provided for (error detection, error correction or monitoring in digital computers or digital computer components G06F11/00) · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

  • G06N3/008Primary

    based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour · CPC title

  • G06N3/02Primary

    Neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12299554B2 cover?
In Multi-Policy Decision-Making (MPDM), many computationally-expensive forward simulations are performed in order to predict the performance of a set of candidate policies. In risk-aware formulations of MPDM, only the worst outcomes affect the decision making process, and efficiently finding these influential outcomes becomes the core challenge. Recently, stochastic gradient optimization algori…
Who is the assignee on this patent?
Univ Michigan Regents
What technology area does this patent fall under?
Primary CPC classification G06N3/008. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 13 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).