Distributed hyperparameter tuning system for machine learning
US-2018240041-A1 · Aug 23, 2018 · US
US12033079B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12033079-B2 |
| Application number | US-201916270681-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 8, 2019 |
| Priority date | Feb 8, 2018 |
| Publication date | Jul 9, 2024 |
| Grant date | Jul 9, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A multi-task (MTL) process is adapted to the single-task learning (STL) case, i.e., when only a single task is available for training. The process is formalized as pseudo-task augmentation (PTA), in which a single task has multiple distinct decoders projecting the output of the shared structure to task predictions. By training the shared structure to solve the same problem in multiple ways, PTA simulates the effect of training towards distinct but closely-related tasks drawn from the same universe. Training dynamics with multiple pseudo-tasks strictly subsumes training with just one, and a class of algorithms is introduced for controlling pseudo-tasks in practice.
Opening claim text (preview).
The invention claimed is: 1. A neural network-based joint model coupled to memory and running on one or more parallel processors, comprising: an encoder that processes an input and generates an encoding; a plurality of decoders that are grouped into sets of decoders in dependence upon corresponding classification tasks, wherein each of the plurality of decoders respectively receive the encoding as input from the encoder, thereby forming encoder-decoder pairs which operate independently of each other when performing the corresponding classification tasks T, wherein each of the plurality of decoders respectively process the received encoding including processing the encoding through their individual decoder layers and classification layers and produce classification scores for classes defined for the corresponding classification tasks; and a trainer that jointly trains the encoder-decoder pairs over at least one backpropagation iterations to perform the corresponding classification tasks wherein the trainer is further configured to comprise: a forward pass stage that processes training inputs through each single encoder and resulting encodings through each of the plurality of decoders, wherein the plurality of decoders includes decoders for performing different classification tasks, paired with a same single encoder to compute respective activations for each of the training inputs; a backward pass stage, that, over each of the at least one thousand backpropagation iterations; determines gradient data for the each of the plurality of decoders for each of the training inputs in dependence upon a loss function, wherein the loss function is cross entropy that uses either a maximum likelihood objective function, a policy gradient function, or both; averages the gradient data determined for the each of the plurality of decoders; and determines gradient data for the same single encoder by backpropagating the averaged gradient data for all of the plurality of decoders through the same single encoder; an update stage that modifies weights of each single encoder in dependence upon the gradient data determined for the single encoder from the plurality of decoders paired therewith; and a persistence stage that, upon convergence after a final backpropagation iteration, persists in the memory the modified weights of the encoder derived by the training to be applied to future classification tasks; wherein for each classification task T, a single model, including a jointly trained encoder-decoder pair, is selected from the jointly trained multiple encoder-decoder pairs in the joint model. 2. The neural network-based model of claim 1 , wherein the encoder is paired with the plurality of decoders D including the plurality of decoders from a same set of decoders and the plurality of decoders from a different set of decoders, wherein each of the plurality of decoders in the same set of decoders performs a first classification task and each of the plurality of decoder in the different set of decoders performs a second classification task. 3. The neural network-based model of any of claim 2 , further configured to use a combination of the modified weights of the encoder derived by the training and modified weights of each decoder within a set of decoders derived by the training to perform the corresponding classification tasks on inference inputs, and wherein the inference inputs are processed by the encoder to produce encodings, followed by one of the decoders processing the encodings to output classification scores for classes defined for the corresponding classification tasks. 4. The neural network-based model of claim 2 , further configured to use a combination of the modified weights of the encoder derived by the training and modified weights of two or more of the decoders derived by the training to respectively perform two or more of the classification tasks on inference inputs, and wherein the inference inputs are processed by the encoder to produce encodings, followed by the two or more of the decoders respectively processing the encodings to output classification scores for classes defined for the two or more of the classification tasks. 5. The neural network-based model of claim 4 , wherein the input, the training inputs, and the inference inputs are selected from a group consisting of image data, text data and genomic data. 6. The neural network-based model of claim 2 , wherein each training input is annotated with a plurality of task-specific labels for the corresponding classification tasks. 7. The neural network-based model of any of claim 1 , wherein a plurality of training inputs for the corresponding classification tasks are fed in parallel to the encoder as input in each forward pass iteration, and wherein each training input is annotated with a task-specific label for a corresponding classification task. 8. The neural network-based model of claim 1 , wherein the encoder is a convolutional neural network (abbreviated CNN) with a plurality of convolution layers arranged in a sequence from lowest to highest. 9. The neural network-based model of claim 1 , wherein the encoding is convolution data. 10. The neural network-based model of claim 1 , wherein each decoder further comprises at least one decoder layer and at least one classification layer. 11. The neural network-based model of claim 10 , wherein each of the numerous decoders is a fully-connected neural network (abbreviated FCNN) and the decoder layer is a fully-connected layer. 12. The neural network-based model of claim 10 , wherein the at least one classification layer is a sigmoid classifier. 13. The neural network-based model of claim 10 , wherein the at least one classification layer is a softmax classifier. 14. The neural network-based model of claim 1 , wherein the encoder is a recurrent neural network (abbreviated RNN), including long short-term memory (LSTM) network or gated recurrent unit (GRU) network. 15. The neural network-based model of claim 1 , wherein the encoding is hidden state data. 16. The neural network-based model of claim 1 , wherein each decoder is a recurrent neural network (abbreviated RNN), including long short-term memory (LSTM) network or gated recurrent unit (GRU) network. 17. The neural network-based model of claim 1 , wherein each decoder is a convolutional neural network (abbreviated CNN) with a plurality of convolution layers arranged in a sequence from lowest to highest. 18. The neural network-based model of claim 1 , wherein the encoder is a fully-connected neural network (abbreviated FCNN) with at least one fully-connected layer. 19. The neural network-based model of claim 1 , wherein at least some of the decoders are of a first neural network type, at least some of the decoders are of a second neural network type, and at least some of the decoders are of a third neural network type. 20. The neural network-based model of claim 1 , wherein at least some of the decoders are convolutional neural networks (abbreviated CNNs) with a plurality of convolution layers arranged in a sequence from lowest to highest, at least some of the decoders are recurrent neural networks (abbreviated RNNs), including long short-term memory (LSTM) networks or gated recurrent unit (GRU) networks, and at least some of the decoders are fully-connected neural networks (abbreviated FCNNs). 21. The neural network-based model of claim 1 , further configured to comprise an initializer that initializes the decoders with random weights
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Supervised learning · CPC title
Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.