Method and system for hyperparameter and algorithm selection for mixed integer linear programming problems using representation learning
US-2020193323-A1 · Jun 18, 2020 · US
US11842279B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11842279-B2 |
| Application number | US-202016945625-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 31, 2020 |
| Priority date | Jul 31, 2019 |
| Publication date | Dec 12, 2023 |
| Grant date | Dec 12, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Certain aspects provide a method for determining a solution to a combinatorial optimization problem, including: determining a plurality of subgraphs, wherein each subgraph of the plurality of subgraphs corresponds to a combinatorial variable of the plurality of combinatorial variables; determining a combinatorial graph based on the plurality of subgraphs; determining evaluation data comprising a set of vertices in the combinatorial graph and evaluations on the set of vertices; fitting a Gaussian process to the evaluation data; determining an acquisition function for vertices in the combinatorial graph using a predictive mean and a predictive variance from the fitted Gaussian process; optimizing the acquisition function on the combinatorial graph to determine a next vertex to evaluate; evaluating the next vertex; updating the evaluation data with a tuple of the next vertex and its evaluation; and determining a solution to the problem, wherein the solution comprises a vertex of the combinatorial graph.
Opening claim text (preview).
What is claimed is: 1. A method for determining optimized neural network architectures, comprising: receiving a plurality of combinatorial variables as input to an optimization process, each respective combinatorial variable of the plurality of combinatorial variables corresponding to a respective hyperparameter for training a neural network architecture; determining a plurality of subgraphs, wherein each subgraph of the plurality of subgraphs corresponds to a combinatorial variable of the plurality of combinatorial variables; determining a combinatorial graph based on the plurality of subgraphs, wherein the combinatorial graph comprises a plurality of vertices; determining evaluation data comprising a set of vertices from the plurality of vertices in the combinatorial graph and evaluations on the set of vertices; fitting a Gaussian process to the evaluation data, wherein the fitted Gaussian process comprises a predictive mean and a predictive variance; determining an acquisition function for vertices in the combinatorial graph using the predictive mean and the predictive variance from the fitted Gaussian process; adjusting the acquisition function on the combinatorial graph to determine a next vertex to evaluate; evaluating the next vertex; updating the evaluation data with a tuple of the next vertex and the evaluation of the next vertex; and providing a vertex of the combinatorial graph, based on the updated evaluation data, as a solution to the optimization process, wherein: the vertex indicates, for each respective combinatorial variable of the plurality of combinatorial variables, a respective value; and the indicated values of the plurality of combinatorial variables are used as the hyperparameters for the neural network architecture. 2. The method of claim 1 , wherein determining a combinatorial graph based on the plurality of subgraphs comprises determining a graph Cartesian product of the plurality of subgraphs. 3. The method of claim 1 , wherein fitting the Gaussian process comprises determining an automatic relevance determination (ARD) diffusion kernel. 4. The method of claim 3 , wherein determining the ARD diffusion kernel comprises: calculating a graph Fourier transform for each sub-graph of the plurality of subgraphs. 5. The method of claim 3 , wherein: the plurality of combinatorial variables comprises m combinatorial variables {C i } i=i . . . m , the plurality of subgraphs comprises m subgraphs {G i } i=1 . . . m , β is a sampled parameter, and the ARD diffusion kernel K=⊗ i=1 m exp(−βL(G i )). 6. The method of claim 3 , wherein: the plurality of combinatorial variables comprises m combinatorial variables {C i } i=1 . . . m , the plurality of subgraphs comprises m subgraphs {G i } i=1 . . . m , β i is a sampled parameter associated with the ith subgraph G i , and the ARD diffusion kernel K=⊗ i=1 m exp(−β i L(G i )). 7. The method of claim 3 , wherein fitting the Gaussian process further comprises using slice sampling. 8. The method of claim 3 , further comprising: using a Horseshoe prior as a scale parameter in the ARD diffusion kernel. 9. The method of claim 1 , wherein the set of vertices comprises a random selection from the plurality of vertices in the combinatorial graph. 10. The method of claim 1 , wherein the solution represents a configuration for learning a neural network. 11. A processing system configured to determine optimized neural network architectures, comprising: a memory comprising computer-executable instructions; one or more processors configured to execute the computer-executable instructions and cause the processing system to: receive a plurality of combinatorial variables as input to an optimization process, each respective combinatorial variable of the plurality of combinatorial variables corresponding to a respective hyperparameter for training a neural network architecture; determine a plurality of subgraphs, wherein each subgraph of the plurality of subgraphs corresponds to a combinatorial variable of the plurality of combinatorial variables; determine a combinatorial graph based on the plurality of subgraphs, wherein the combinatorial graph comprises a plurality of vertices; determine evaluation data comprising a set of vertices from the plurality of vertices in the combinatorial graph and evaluations on the set of vertices; fit a Gaussian process to the evaluation data, wherein the fitted Gaussian process comprises a predictive mean and a predictive variance; determine an acquisition function for vertices in the combinatorial graph using the predictive mean and the predictive variance from the fitted Gaussian process; adjust the acquisition function on the combinatorial graph to determine a next vertex to evaluate; evaluate the next vertex; update the evaluation data with a tuple of the next vertex and the evaluation of the next vertex; and provide a vertex of the combinatorial graph, based on the updated evaluation data, as a solution to the optimization process, wherein: the vertex indicates, for each respective combinatorial variable of the plurality of combinatorial variables, a respective value; and the indicated values of the plurality of combinatorial variables are used as the hyperparameters for the neural network architecture. 12. The processing system of claim 11 , wherein in order to determine a combinatorial graph based on the plurality of subgraphs, the one or more processors are further configured to determine a graph Cartesian product of the plurality of subgraphs. 13. The processing system of claim 11 , wherein in order to fit the Gaussian process, the one or more processors are further configured to determine an automatic relevance determination (ARD) diffusion kernel. 14. The processing system of claim 13 , wherein in order to determine the ARD diffusion kernel, the one or more processors are further configured to calculate a graph Fourier transform for each sub-graph of the plurality of subgraphs. 15. The processing system of claim 13 , wherein: the plurality of combinatorial variables comprises m combinatorial variables {C i } i=1 . . . m , the plurality of subgraphs comprises m subgraphs {G i } i=1 . . . m , β is a sampled parameter, and the ARD diffusion kernel K=⊗ i=1 m exp(−βL(G i )). 16. The processing system of claim 13 , wherein: the plurality of combinatorial variables comprises m combinatorial variables {C i } i=1 . . . m , the plurality of subgraphs comprises m subgraphs {G i } i=1 . . . m , β i is a sampled parameter associated with the ith subgraph G i , and the ARD diffusion kernel K=⊗ i=1 m exp(−β i L(G i )). 17. The processing system of claim 13 , wherein in order to fit the Gaussian process, the one or more processors are further configured to use slice sampling. 18. The processing system of claim 13 , wherein the one or more processors are further configured to use a Horseshoe prior as a scale parameter in the ARD diffusion kernel. 19. The processing system of claim 11 , wherein the set of vertices comprises a random selection from the plurality of vertices in the combinatorial graph. 20. The processing system of claim 11 , wherein the solution represents a configuration for learning a neural network. 21. A non-transitory computer-readable medium comprising computer-executable instructions that, when executed by one or more processors of a processing system, cause the processing system to perform a method for determining optimized neural net
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title
Pre-processing; Data cleansing · CPC title
Validation; Performance evaluation; Active pattern learning techniques · CPC title
Graphical models, e.g. Bayesian networks · CPC title
Probabilistic graphical models, e.g. probabilistic networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.