Distributed hyperparameter tuning system for machine learning
US-2018240041-A1 · Aug 23, 2018 · US
US11030529B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11030529-B2 |
| Application number | US-201816219286-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 13, 2018 |
| Priority date | Dec 13, 2017 |
| Publication date | Jun 8, 2021 |
| Grant date | Jun 8, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Evolution and coevolution of neural networks via multitask learning is described. The foundation is (1) the original soft ordering, which uses a fixed architecture for the modules and a fixed routing (i.e. network topology) that is shared among all tasks. This architecture is then extended in two ways with CoDeepNEAT: (2) by coevolving the module architectures (CM), and (3) by coevolving both the module architectures and a single shared routing for all tasks using (CMSR). An alternative evolutionary process (4) keeps the module architecture fixed, but evolves a separate routing for each task during training (CTR). Finally, approaches (2) and (4) are combined into (5), where both modules and task routing are coevolved (CMTR).
Opening claim text (preview).
The invention claimed is: 1. A processor implemented method for evolving task-specific topologies in a multitask architecture comprising: establishing a set of shared modules which are shared among each task-specific topology; initializing the shared modules { k } k=1 K with random weights; initializing a champion individual module routing scheme for each task (t), wherein the ith individual for the tth task is represented by a tuple (E ti , G ti ,D ti ), and further wherein E ti is an encoder, G ti is a DAG, which specifies the individual module routing scheme, and D ti is a decoder, with E ti and D ti initialized with random weights; for each champion individual (E ti , G ti , D ti ), generating a challenger (E t2 , G t2 , D t2 ) by mutating the tth champion in accordance with a predetermined mutation subprocess; jointly training each champion and challenger for M iterations on a training set of data; evaluating each champion and challenger on a validation set of data to determine an accuracy fitness for each individual champion and challenger for its predetermined task; if a challenger has higher accuracy fitness than a corresponding champion, then the champion is replaced wherein (E ti , G ti , D ti )=(E t2 , G t2 , D t2 ); calculating an average accuracy fitness across all champions for tasks in the multitask architecture; and checkpointing the shared modules when the average accuracy is best achieved. 2. The process according to claim 1 , wherein the predetermined mutation subprocess of includes: (i) start as a copy of the champion, including learned weights, wherein (E t2 , G t2 , D t2 ):=(E ti , G ti , D ti ); (ii) randomly select a pair of nodes (u, v) from G t2 such that v is an ancestor of u; (iii) randomly select a module M k from the shared modules; (iv) add a new node w to G t2 with M k as its function; (v) add new edges (u,w) and (w,v) to G t2 ; (vi) set the scalar weight of (w,v) such that its value after softmax is some α∈(0,1). 3. The process according to claim 1 , wherein the training set of data and the validation set of data are disjointed. 4. The process according to claim 1 , wherein G ti is initialized in accordance with a graph initialization policy. 5. The process according to claim 1 , wherein a model for an individual is then given by y t =( ti ∘ ( G ti ,{ k } k=1 K )∘ε ti )( x t ), where R indicates application of the shared modules M k based on the DAG G ti . 6. The process according to claim 5 , wherein E ti and D ti are selected from a grouping consisting of neural network functions that are compatible with the set of shared modules. 7. The process according to claim 6 , wherein each E ti is an identity transformation layer, and D ti , is a fully connected classification layer. 8. The process according to claim 1 , wherein G ti is a DAG whose single source node represents the input layer for that task (t), and whose single sink node represents the output layer and further wherein all other nodes either point to a module M k to be applied at that location, or to a parameterless adapter layer for ensuring adjacent modules are technically compatible. 9. A processor implemented method for evolving task-specific topologies and shared modules in a multitask architecture comprising: initializing a population of modules and randomly selecting modules (m) from each species in the population and grouping selected modules from each species (k) together into sets of modules M k ; providing the sets of modules M k to a task-specific routing evolution subprocess, wherein the subprocess: establishes a set of shared modules which are shared among each task-specific topology; initializes a champion individual module routing scheme for each task (t), wherein the ith individual for the tth task is represented by a tuple (E ti , G ti , D ti ), and further wherein E ti is an encoder, G ti is a DAG, which specifies the individual module routing scheme, and D ti is a decoder, with E ti and D ti initialized with random weights; for each champion individual (E ti , G ti , D ti ), generating a challenger (E t2 , G t2 , D t2 ) by mutating the tth champion in accordance with a predetermined mutation subprocess; jointly training each champion and challenger for M iterations on a training set of data; evaluating each champion and challenger on a validation set of data to determine an accuracy fitness for each individual champion and challenger for its predetermined task; if a challenger has higher accuracy fitness than a corresponding champion, then the champion is replaced wherein (E ti , G ti , D ti )=(E t2 , G t2 , D t2 ); calculating an average accuracy fitness across all champions for tasks in the multitask architecture; checkpointing the shared modules when the average accuracy fitness is best achieved; attributing the best achieved average accuracy fitness determined from the task-specific routing evolution subprocess to each module (m) as part of a module evolution subprocess which further includes applying evolutionary operators to evolve modules (m). 10. The process according to claim 9 , wherein the predetermined mutation subprocess of the task-specific routing evolution subprocess includes: (i) start as a copy of the champion, including learned weights, wherein (E t2 , G t2 , D t2 ):=(E ti , G ti , D ti ); (ii) randomly select a pair of nodes (u,v) from G t2 such that v is an ancestor of u; (iii) randomly select a module M k from the shared modules; (iv) add a new node w to G t2 with M k as its function; (v) add new edges (u,w) and (w,v) to G t2 ; (vi) set the scalar weight of (w,v) such that its value after softmax is some α∈(0,1).
Activation functions · CPC title
Combinations of networks · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Architecture, e.g. interconnection topology · CPC title
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.