Partitioned machine learning architecture
US-2021295166-A1 · Sep 23, 2021 · US
US11580376B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11580376-B2 |
| Application number | US-201816002649-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 7, 2018 |
| Priority date | Jun 9, 2017 |
| Publication date | Feb 14, 2023 |
| Grant date | Feb 14, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An electronic apparatus is provided. The electronic apparatus includes: a memory storing a trained model including a plurality of layers; and a processor initializing a parameter matrix and a plurality of split variables of a trained model, calculating a new parameter matrix having a block-diagonal matrix for the plurality of split variables and the trained model to minimize a loss function for the trained model, a weight decay regularization term, and an objective function including a split regularization term defined by the parameter matrix and the plurality of split variables, vertically splitting the plurality of layers according to the group based on the computed split parameters and reconstruct the trained model using the computed new parameter matrix as parameters of the vertically split layers.
Opening claim text (preview).
What is claimed is: 1. A method for optimizing a trained model, the method comprising: initializing a parameter matrix and a plurality of split variables of a trained model configured of a plurality of layers; calculating a new parameter matrix having a block-diagonal matrix for the plurality of split variables and the trained model to minimize a loss function for the trained model, a weight decay regularization term, and an objective function including a split regularization term defined by the parameter matrix and the plurality of split variables; vertically splitting the plurality of layers according to a group based on computed split parameters; and reconstructing the trained model using the calculated new parameter matrix as parameters of the vertically split layers, wherein the split regularization term comprises a group weight regularization term that suppresses an inter-group connection and activates only an intra-group connection, a disjoint group assignment that makes each group be orthogonal to each other, and a balanced group assignment that regularizes against a difference between a size of one group and a size of another group. 2. The method as claimed in claim 1 , wherein in the initializing, the parameter matrix is initialized randomly and the plurality of split variables are initialized not to be uniform to each other. 3. The method as claimed in claim 1 , wherein in the computing, a stochastic gradient descent method is used so that the objective function is minimized. 4. The method as claimed in claim 1 , further comprising: computing a second-order new parameter matrix for the reconstructed trained model to minimize the loss function for the trained model and a second objective function including only the weight decay regularization term, and optimizing the trained model using the computed second-order new parameter matrix as parameters of the vertically split layers. 5. The method as claimed in claim 4 , further comprising: parallelizing each of the vertically split layers within the optimized trained model using different processors. 6. An electronic apparatus comprising: a memory configured to store a trained model configured of a plurality of layers; and a processor configured to: initialize a parameter matrix and a plurality of split variables of a trained model, calculate a new parameter matrix having a block-diagonal matrix for the plurality of split variables and the trained model to minimize a loss function for the trained model, a weight decay regularization term, and an objective function including a split regularization term defined by the parameter matrix and the plurality of split variables, vertically split the plurality of layers according to a group based on a computed split parameters, and reconstruct the trained model using the calculated new parameter matrix as parameters of the vertically split layers, wherein the split regularization term comprises a group weight regularization term that suppresses an inter-group connection and activates only an intra-group connection, a disjoint group assignment that makes each group be orthogonal to each other, and a balanced group assignment that regularizes against a difference between a size of one group and a size of another group. 7. The electronic apparatus as claimed in claim 6 , wherein the processor is further configured to randomly initialize the parameter matrix and initializes the plurality of split variables not to be uniform to each other. 8. The electronic apparatus as claimed in claim 6 , wherein the processor is further configured to use a stochastic gradient descent method to minimize the objective function. 9. The electronic apparatus as claimed in claim 6 , wherein the processor is further configured to: compute the second-order new parameter matrix for the reconstructed trained model to minimize the loss function for the trained model and a second objective function including only the weight decay regularization, and optimize the trained model using the computed second-order new parameter matrix as parameters of the vertically split layers. 10. A non-transitory computer readable recording medium including a program for executing a method for optimizing a trained model in an electronic apparatus, wherein the method for optimizing a trained model includes: initializing a parameter matrix and a plurality of split variables of a trained model including a plurality of layers; calculating a new parameter matrix having a block-diagonal matrix for the plurality of split variables and the trained model to minimize a loss function for the trained model, a weight decay regularization term, and an objective function including a split regularization term defined by the parameter matrix and the plurality of split variables; vertically splitting the plurality of layers according to a group based on computed split parameters; and reconstructing the trained model using the calculated new parameter matrix as parameters of the vertically split layers, wherein the split regularization term comprises a group weight regularization term that suppresses an inter-group connection and activates only an intra-group connection, a disjoint group assignment that makes each group be orthogonal to each other, and a balanced group assignment that regularizes against a difference between a size of one group and a size of another group.
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title
Distributed learning, e.g. federated learning · CPC title
Supervised learning · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.