Electronic apparatus and method for optimizing trained model

US11580376B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11580376-B2
Application numberUS-201816002649-A
CountryUS
Kind codeB2
Filing dateJun 7, 2018
Priority dateJun 9, 2017
Publication dateFeb 14, 2023
Grant dateFeb 14, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An electronic apparatus is provided. The electronic apparatus includes: a memory storing a trained model including a plurality of layers; and a processor initializing a parameter matrix and a plurality of split variables of a trained model, calculating a new parameter matrix having a block-diagonal matrix for the plurality of split variables and the trained model to minimize a loss function for the trained model, a weight decay regularization term, and an objective function including a split regularization term defined by the parameter matrix and the plurality of split variables, vertically splitting the plurality of layers according to the group based on the computed split parameters and reconstruct the trained model using the computed new parameter matrix as parameters of the vertically split layers.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for optimizing a trained model, the method comprising: initializing a parameter matrix and a plurality of split variables of a trained model configured of a plurality of layers; calculating a new parameter matrix having a block-diagonal matrix for the plurality of split variables and the trained model to minimize a loss function for the trained model, a weight decay regularization term, and an objective function including a split regularization term defined by the parameter matrix and the plurality of split variables; vertically splitting the plurality of layers according to a group based on computed split parameters; and reconstructing the trained model using the calculated new parameter matrix as parameters of the vertically split layers, wherein the split regularization term comprises a group weight regularization term that suppresses an inter-group connection and activates only an intra-group connection, a disjoint group assignment that makes each group be orthogonal to each other, and a balanced group assignment that regularizes against a difference between a size of one group and a size of another group. 2. The method as claimed in claim 1 , wherein in the initializing, the parameter matrix is initialized randomly and the plurality of split variables are initialized not to be uniform to each other. 3. The method as claimed in claim 1 , wherein in the computing, a stochastic gradient descent method is used so that the objective function is minimized. 4. The method as claimed in claim 1 , further comprising: computing a second-order new parameter matrix for the reconstructed trained model to minimize the loss function for the trained model and a second objective function including only the weight decay regularization term, and optimizing the trained model using the computed second-order new parameter matrix as parameters of the vertically split layers. 5. The method as claimed in claim 4 , further comprising: parallelizing each of the vertically split layers within the optimized trained model using different processors. 6. An electronic apparatus comprising: a memory configured to store a trained model configured of a plurality of layers; and a processor configured to: initialize a parameter matrix and a plurality of split variables of a trained model, calculate a new parameter matrix having a block-diagonal matrix for the plurality of split variables and the trained model to minimize a loss function for the trained model, a weight decay regularization term, and an objective function including a split regularization term defined by the parameter matrix and the plurality of split variables, vertically split the plurality of layers according to a group based on a computed split parameters, and reconstruct the trained model using the calculated new parameter matrix as parameters of the vertically split layers, wherein the split regularization term comprises a group weight regularization term that suppresses an inter-group connection and activates only an intra-group connection, a disjoint group assignment that makes each group be orthogonal to each other, and a balanced group assignment that regularizes against a difference between a size of one group and a size of another group. 7. The electronic apparatus as claimed in claim 6 , wherein the processor is further configured to randomly initialize the parameter matrix and initializes the plurality of split variables not to be uniform to each other. 8. The electronic apparatus as claimed in claim 6 , wherein the processor is further configured to use a stochastic gradient descent method to minimize the objective function. 9. The electronic apparatus as claimed in claim 6 , wherein the processor is further configured to: compute the second-order new parameter matrix for the reconstructed trained model to minimize the loss function for the trained model and a second objective function including only the weight decay regularization, and optimize the trained model using the computed second-order new parameter matrix as parameters of the vertically split layers. 10. A non-transitory computer readable recording medium including a program for executing a method for optimizing a trained model in an electronic apparatus, wherein the method for optimizing a trained model includes: initializing a parameter matrix and a plurality of split variables of a trained model including a plurality of layers; calculating a new parameter matrix having a block-diagonal matrix for the plurality of split variables and the trained model to minimize a loss function for the trained model, a weight decay regularization term, and an objective function including a split regularization term defined by the parameter matrix and the plurality of split variables; vertically splitting the plurality of layers according to a group based on computed split parameters; and reconstructing the trained model using the calculated new parameter matrix as parameters of the vertically split layers, wherein the split regularization term comprises a group weight regularization term that suppresses an inter-group connection and activates only an intra-group connection, a disjoint group assignment that makes each group be orthogonal to each other, and a balanced group assignment that regularizes against a difference between a size of one group and a size of another group.

Assignees

Inventors

Classifications

  • G06N3/082Primary

    modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

  • Distributed learning, e.g. federated learning · CPC title

  • Supervised learning · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11580376B2 cover?
An electronic apparatus is provided. The electronic apparatus includes: a memory storing a trained model including a plurality of layers; and a processor initializing a parameter matrix and a plurality of split variables of a trained model, calculating a new parameter matrix having a block-diagonal matrix for the plurality of split variables and the trained model to minimize a loss function for…
Who is the assignee on this patent?
Korea Advanced Inst Sci & Tech
What technology area does this patent fall under?
Primary CPC classification G06N3/082. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 14 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).