Method and system for parallel batch processing of data sets using Gaussian process with batch upper confidence bound

US9342786B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9342786-B2
Application numberUS-201313919757-A
CountryUS
Kind codeB2
Filing dateJun 17, 2013
Priority dateJun 15, 2012
Publication dateMay 17, 2016
Grant dateMay 17, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and system for selecting a batch of input data from available input data for parallel evaluation by a function is disclosed. The function is modeled as drawn from a Gaussian process. Observations are used to determine a mean and a variance of the modeled function. An upper confidence bound is determined from the determined mean and variance. A decision rule is applied to select input data from the available input data to add to the batch of input data. The selection of the input data is based on a domain-specific time varying parameter. Intermediate observations are hallucinated within the batch. The hallucinated observations are used with the decision rule to select subsequent input data from the available input data for the batch of input data. The input data of the batch is evaluated in parallel with the function. The resulting determined data outputs are stored.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of selecting a batch of input data from available input data for parallel evaluation by a function, the method comprising: via a controller, modeling the function as drawn from a Gaussian process; using observations to determine a mean and a variance of the modeled function; determining an upper confidence bound from the determined mean and variance; via the controller, applying a decision rule to select input data from the available input data to add to the batch of input data, wherein the selection is based on a domain-specific time varying parameter; hallucinating intermediate observations for a plurality of observations not yet available within the batch; via the controller, using the hallucinated observations with the decision rule to select subsequent input data from the available input data for the batch of input data via the controller; via the controller, evaluating the input data of the batch in parallel with the function; and storing the resulting determined data outputs in a memory device. 2. The method of claim 1 , wherein the input data relates to peptide binding properties for automated vaccine design, and the data outputs are corresponding binding affinities. 3. The method of claim 1 , wherein the input data relates to electrodes for electrical stimulation and locations on a spinal cord, and the data outputs relate to reaction to stimulation. 4. The method of claim 1 , wherein the decision rules trades off exploitation with exploration in selecting the input data to add to the batch. 5. The method of claim 1 , wherein a regret bound is limited by initializing the modeling with a finite set of observations. 6. The method of claim 1 , further comprising selecting a second batch of input data for evaluation with the function after evaluating the input data of the batch, wherein the second batch of input data is selected with feedback from the evaluation of the first batch of data. 7. The method of claim 1 , wherein the input data for the batch are determined according to: X t = argmax x ∈ D ⁡ [ μ fb ⁡ [ t ] ⁡ ( x ) + β t 1 / 2 ⁢ σ t - 1 ⁡ ( x ) ] wherein X is the plurality of data inputs for the batch, x is input data of the batch, β t is the domain-specific time-varying parameter to trade off exploitation and exploration, μ fb[t] (x) is a posterior mean and σ t−1 (x) is a standard deviation. 8. The method of claim 7 , wherein confidence intervals associated with the domain-specific time-varying parameter, β t contain the true function with high probability. 9. The method of claim 7 , wherein variance is not recalculated for the next x in the batch that lies within the argmax. 10. The method of claim 1 , wherein the number of data inputs in the batch has a variable length determined by the information gained by the evaluation of the function. 11. The method of claim 1 , wherein the number of data inputs in the batch has a variable length determined by on posterior uncertainty. 12. The method of claim 1 , wherein the domain-specific time-varying parameter is scheduled according to exp(2C) α fb[t] wherein C is an upper bound on the conditional mutual information gain from the batch and α fb[t] is chosen according to the domain and requirements of a regret bound. 13. The method of claim 1 , wherein the domain-specific time-varying parameter is offset by an additive or subtractive value. 14. The method of claim 1 , further comprising: selecting an initialization set from the available input data by uncertainty sampling without prior feedback; obtaining feedback outputs from the initialization set with the function; and applying the outputs to the Gaussian process. 15. A system for determining a batch of input data from available input data for parallel evaluation by a function, the system comprising: a storage device storing a database including the available input data; a controller coupled to the storage device, the controller operable to: model the function as drawn from a Gaussian process; use observations to determine a mean and a variance of the modeled function; determine an upper confidence bound from the determined mean and variance; apply a decision rule to select input data from the available input data to add to the batch of input data, wherein the selection is based on a domain-specific time varying parameter; hallucinate intermediate observations for a plurality of observations not yet available within the batch; select subsequent input data from the available input data for the batch of input data using the hallucinated observations with the decision rule; evaluate the input data of the batch in parallel with the function; and store the resulting determined data outputs. 16. The system of claim 15 , wherein the decision rule trades off exploitation with exploration when selecting the input data for the batch. 17. The system of claim 15 , wherein a regret bound is limited by initializing the function model with a finite set of observations. 18. The system of claim 15 , wherein the controller is further operable to select a second batch of input data for evaluation with the function, wherein the second batch of input data is selected with feedback from the evaluation of the first batch of data. 19. The system of claim 15 , wherein the input data for the batch is determined according to: X t = argmax x ∈ D ⁡ [

Assignees

Inventors

Classifications

  • G06N5/025Primary

    Extracting rules from data · CPC title

  • Inference or reasoning models · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9342786B2 cover?
A method and system for selecting a batch of input data from available input data for parallel evaluation by a function is disclosed. The function is modeled as drawn from a Gaussian process. Observations are used to determine a mean and a variance of the modeled function. An upper confidence bound is determined from the determined mean and variance. A decision rule is applied to select input d…
Who is the assignee on this patent?
California Inst Of Techn
What technology area does this patent fall under?
Primary CPC classification G06N5/025. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 17 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).