Ensembling of neural network models
US-2019130277-A1 · May 2, 2019 · US
US11657265B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11657265-B2 |
| Application number | US-201816191542-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 15, 2018 |
| Priority date | Nov 20, 2017 |
| Publication date | May 23, 2023 |
| Grant date | May 23, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Described herein are systems and methods for training first and second neural network models. A system comprises a memory comprising instruction data representing a set of instructions and a processor configured to communicate with the memory and to execute the set of instructions. The set of instructions, when executed by the processor, cause the processor to set a weight in the second model based on a corresponding weight in the first model, train the second model on a first dataset, wherein the training comprises updating the weight in the second model and adjust the corresponding weight in the first model based on the updated weight in the second model.
Opening claim text (preview).
The invention claimed is: 1. A system configured for training first and second neural network models, the system comprising: a memory comprising instruction data representing a set of instructions; a processor configured to communicate with the memory and to execute the set of instructions, wherein the set of instructions, when executed by the processor, cause the processor to: set a weight in the second model based on a corresponding weight in the first model; train the second model on a first dataset, wherein the training comprises updating the weight in the second model; and adjust the corresponding weight in the first model based on the updated weight in the second model by applying an increment to a value of the corresponding weight in the first model based on a difference between the corresponding weight in the first model and the weight in the second model. 2. The system as in claim 1 , wherein the weight comprises a weight in one of: an input layer of the second model; and a hidden layer of the second model. 3. The system as in claim 1 , wherein causing the processor to adjust the corresponding weight in the first model comprises causing the processor to: copy a value of the weight from the second model to the corresponding weight in the first model. 4. The system as in claim 1 , wherein causing the processor to adjust the corresponding weight in the first model further comprises causing the processor to: set a weight in an output layer of the first model to an arbitrary value. 5. The system as in claim 1 , wherein causing the processor to adjust the corresponding weight in the first model further comprises causing the processor to: maintain a value of at least one weight in an output layer of the first model at the same value. 6. The system as in claim 1 , wherein causing the processor to set a weight in the second model comprises causing the processor to: copy a value of a weight from one of: an input layer of the first model; and a hidden layer of the first model, to a corresponding weight in the second model. 7. The system as in claim 1 , wherein causing the processor to set a weight in the second model further comprises causing the processor to: set at least one weight in an output layer of the second model to an arbitrary value. 8. The system as in claim 1 , wherein the first model comprises one of: an object detection model; and an object localization model; and wherein the second model comprises the other one of: an object detection model; and an object localization model. 9. The system as in claim 1 , wherein the first model comprises one of: a model configured to produce a single output; and a model configured to produce a plurality of outputs; and wherein the second model comprises the other one of: a model configured to produce a single output; and a model configured to produce a plurality of outputs. 10. The system as in claim 1 , wherein the set of instructions, when executed by the processor, further cause the processor to: adjust a weight in one of: the first model; and the second model; in response to further training of the other one of: the first model; and the second model. 11. The system as in claim 10 , wherein the set of instructions, when executed by the processor, cause the processor to repeat the step of adjusting a weight, until one or more of the following criteria are met: the first model and/or the second model reach a threshold accuracy level; the magnitude of an adjustment falls below a threshold magnitude; said weight in the first model and its corresponding weight in the second model converge towards one another within a predefined threshold; and a loss associated with the first model and/or a loss associated with the second model changes by less than a threshold amount between subsequent adjustments. 12. The system as in claim 1 , wherein the first model is trained on a second dataset, the first dataset comprising less data than the second dataset, wherein the size of the second dataset alone is insufficient to train the second model to a predefined accuracy with arbitrarily initiated weights. 13. A computer implemented method of training first and second neural network models, the method comprising: setting a weight in the second model based on a corresponding weight in the first model; training the second model on a first dataset, wherein the training comprises updating the weight in the second model; and adjusting the corresponding weight in the first model based on the updated weight in the second model, wherein the first model is trained on a second dataset, the first dataset comprising less data than the second dataset, wherein a size of the second dataset alone is insufficient to train the second model to a predefined accuracy with arbitrarily initiated weights. 14. A non-transitory computer readable medium comprising computer readable code embodied therein, the computer readable code being configured such that, on execution by a computer or processor, the computer or processor: sets a weight in a second model based on a corresponding weight in a first model; trains the second model on a first dataset, wherein the training comprises updating the weight in the second model; and adjusts the corresponding weight in the first model based on the updated weight in the second model, wherein the first model is trained on a second dataset, the first dataset comprising less data than the second dataset, wherein a size of the second dataset alone is insufficient to train the second model to a predefined accuracy with arbitrarily initiated weights. 15. The system as in claim 1 , wherein the processor trains one of the first or second neural network models to detect a presence of a particular object in an image, and the processor trains another of the first or second neural network models to measure a length of a particular type of object in an image. 16. The system as in claim 1 , wherein at least one of the first model and second model is a partially trained model. 17. The system as in claim 1 , wherein the first model is a partially trained model. 18. The system as in claim 1 , wherein both the first and the second models are partially trained models, and a model of the first and the second models is trained more than a second of the first and the second models. 19. The system as in claim 1 , wherein the corresponding weight in the first model may be adjusted a percentage of the difference between the corresponding weight in the first model and the weight in the second model. 20. The system as in claim 1 , wherein the second dataset comprises medical images annotated with x,y coordinates of a center of a bounding box drawn around tissue of interest.
Learning methods · CPC title
Supervised learning · CPC title
Transfer learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Combinations of networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.