Composing efficient and robust tests to assess artificial agents

US2024256433A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2024256433-A1
Application numberUS-202418418835-A
CountryUS
Kind codeA1
Filing dateJan 22, 2024
Priority dateJan 25, 2023
Publication dateAug 1, 2024
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Modern reinforcement learning systems produce many high-quality policies throughout the learning process. However, to choose which policy to actually deploy in the real world, they must be tested under an intractable number of environmental conditions. A process, called Robust Population Optimization for a Small Set of Test cases (“RPOSST”), can select a small set of test cases from a larger pool based on a relatively small number of sample evaluations. RPOSST can treat the test case selection problem as a two-player game and can optimize a solution with provable k-of-N robustness, bounding the error relative to a test that used all the test cases in the pool. Empirical results demonstrate that RPOSST finds a small set of test cases that identify high quality policies in a toy one-shot game, poker datasets, and a high-fidelity racing simulator.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for determining a subset of test cases, selected from a set of test cases, that identify candidate deployment policies from a set of policies, comprising: evaluating each tuning policy of a subset of tuning policies, from the set of policies, with each test case, from the set of test cases, to generate a result matrix of test case results; utilizing a two-player game formulation for determining a loss function for each of m test cases sampled on each of N sampled policies from the set of policies; and selecting the subset of test cases based on a plurality of rounds of the two-player game formulation, wherein the subset of test cases are operable for determining the candidate deployment policies. 2 . The computer-implemented method of claim 1 , wherein the two-player formulation includes: choosing an m-tuple of test cases, from the set of test cases, and weights for each of the m-tuple of test cases; sampling N policies to test and target distributions for each of the N policies from an uncertainty distribution; choosing the k worst policies and respective target distributions that maximize the loss function; and sampling one of the k words policies and respective target distribution to provide a payoff. 3 . The computer-implemented method of claim 1 , wherein the selected test case and the tuning policy choice are performed simultaneously. 4 . The computer-implemented method of claim 1 , wherein the selected test case and the tuning policy choice are performed in sequence. 5 . The computer-implemented method of claim 1 , wherein the selected test case and the tuning policy choice are performed in sequence and hyperparameter values and constraints are applied, resulting in deterministic behavior. 6 . The computer-implemented method of claim 1 , wherein the tuning policies are selected from a reinforcement learning process. 7 . The computer-implemented method of claim 1 , wherein the tuning policies are selected to include a collection of skilled and unskilled policies with random variations. 8 . The computer-implemented method of claim 1 , wherein the tuning policies are selected with architectural and algorithmic similarities to future development candidate policies. 9 . The computer-implemented method of claim 2 , wherein the target distribution is based on a fixed uniform distribution over the m-tuple of test cases. 10 . The computer-implemented method of claim 2 , wherein test case selection is robust against differences between the tuning policies and the candidate deployment policies. 11 . The computer-implemented method of claim 2 , wherein test case selection is robust against differences between the target distribution used during training and an actual target distribution. 12 . The computer-implemented method of claim 1 , further comprising determining a weighting for each of the subset of test cases. 13 . The computer-implemented method of claim 12 , wherein the weighting on each of the subset of test cases is determined by expert guidance. 14 . The computer-implemented method of claim 1 , wherein the candidate deployment policies are policy for an artificial agent in a competitive racing simulation. 15 . A method for selecting policies to use in a racing simulation, comprising: accessing, by a development server, a set of candidate policies, where each policy is a collection of data stored in a policy database and represents at least one behavior for an agent operating a car in a racing simulation; selecting, by the development server, one or more tuning policies from the set of candidate policies; accessing, by the development server, a set of candidate test cases, where each test case is a collection of data stored in a test case database and represents at least one condition in an environment in the racing simulation; selecting, by the development server, one or more test cases from the candidate cases; first reviewing, by the development server, a performance of using the selected test cases with the tuning policies, where the first reviewing includes iteratively using machine learning; selecting, by the development server, one or more test cases as application test cases based on results of the first reviewing; second reviewing, by the development server, performance of using the application test cases with one or more candidate policies; and selecting, by the development server, one or more policies as deployment policies based on the results of the second reviewing. 16 . The method of claim 15 , where the first reviewing includes using a protagonist and an adversary. 17 . The method of claim 15 , further comprising sending the deployment policies to a game system. 18 . A computer-implemented method for identifying candidate deployment policies from a set of policies, comprising: evaluating each tuning policy of a subset of tuning policies, from the set of policies, with each test case, from a set of test cases, to generate a result matrix of test case results; utilizing a two-player game formulation for determining a loss function for each of m test cases sampled on each of N sampled policies from the set of policies; selecting a subset of test cases based on a plurality of rounds of the two-player game formulation; and sampling the set of policies on the subset of test cases to determine the candidate deployment policies. 19 . The computer-implemented method of claim 18 , further comprising determining a weighting for each of the subset of test cases. 20 . The computer-implemented method of claim 18 , wherein the tuning policies are selected to include a collection of skilled and unskilled policies with random variations.

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • for test design, e.g. generating new test cases · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024256433A1 cover?
Modern reinforcement learning systems produce many high-quality policies throughout the learning process. However, to choose which policy to actually deploy in the real world, they must be tested under an intractable number of environmental conditions. A process, called Robust Population Optimization for a Small Set of Test cases (“RPOSST”), can select a small set of test cases from a larger po…
Who is the assignee on this patent?
Sony Group Corp
What technology area does this patent fall under?
Primary CPC classification G06F11/3684. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Aug 01 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).