What technology area does this patent fall under?

Primary CPC classification G06N3/082. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 14 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Electronic apparatus and method for optimizing trained model

US11580376B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11580376-B2
Application number	US-201816002649-A
Country	US
Kind code	B2
Filing date	Jun 7, 2018
Priority date	Jun 9, 2017
Publication date	Feb 14, 2023
Grant date	Feb 14, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An electronic apparatus is provided. The electronic apparatus includes: a memory storing a trained model including a plurality of layers; and a processor initializing a parameter matrix and a plurality of split variables of a trained model, calculating a new parameter matrix having a block-diagonal matrix for the plurality of split variables and the trained model to minimize a loss function for the trained model, a weight decay regularization term, and an objective function including a split regularization term defined by the parameter matrix and the plurality of split variables, vertically splitting the plurality of layers according to the group based on the computed split parameters and reconstruct the trained model using the computed new parameter matrix as parameters of the vertically split layers.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for optimizing a trained model, the method comprising: initializing a parameter matrix and a plurality of split variables of a trained model configured of a plurality of layers; calculating a new parameter matrix having a block-diagonal matrix for the plurality of split variables and the trained model to minimize a loss function for the trained model, a weight decay regularization term, and an objective function including a split regularization term defined by the parameter matrix and the plurality of split variables; vertically splitting the plurality of layers according to a group based on computed split parameters; and reconstructing the trained model using the calculated new parameter matrix as parameters of the vertically split layers, wherein the split regularization term comprises a group weight regularization term that suppresses an inter-group connection and activates only an intra-group connection, a disjoint group assignment that makes each group be orthogonal to each other, and a balanced group assignment that regularizes against a difference between a size of one group and a size of another group. 2. The method as claimed in claim 1 , wherein in the initializing, the parameter matrix is initialized randomly and the plurality of split variables are initialized not to be uniform to each other. 3. The method as claimed in claim 1 , wherein in the computing, a stochastic gradient descent method is used so that the objective function is minimized. 4. The method as claimed in claim 1 , further comprising: computing a second-order new parameter matrix for the reconstructed trained model to minimize the loss function for the trained model and a second objective function including only the weight decay regularization term, and optimizing the trained model using the computed second-order new parameter matrix as parameters of the vertically split layers. 5. The method as claimed in claim 4 , further comprising: parallelizing each of the vertically split layers within the optimized trained model using different processors. 6. An electronic apparatus comprising: a memory configured to store a trained model configured of a plurality of layers; and a processor configured to: initialize a parameter matrix and a plurality of split variables of a trained model, calculate a new parameter matrix having a block-diagonal matrix for the plurality of split variables and the trained model to minimize a loss function for the trained model, a weight decay regularization term, and an objective function including a split regularization term defined by the parameter matrix and the plurality of split variables, vertically split the plurality of layers according to a group based on a computed split parameters, and reconstruct the trained model using the calculated new parameter matrix as parameters of the vertically split layers, wherein the split regularization term comprises a group weight regularization term that suppresses an inter-group connection and activates only an intra-group connection, a disjoint group assignment that makes each group be orthogonal to each other, and a balanced group assignment that regularizes against a difference between a size of one group and a size of another group. 7. The electronic apparatus as claimed in claim 6 , wherein the processor is further configured to randomly initialize the parameter matrix and initializes the plurality of split variables not to be uniform to each other. 8. The electronic apparatus as claimed in claim 6 , wherein the processor is further configured to use a stochastic gradient descent method to minimize the objective function. 9. The electronic apparatus as claimed in claim 6 , wherein the processor is further configured to: compute the second-order new parameter matrix for the reconstructed trained model to minimize the loss function for the trained model and a second objective function including only the weight decay regularization, and optimize the trained model using the computed second-order new parameter matrix as parameters of the vertically split layers. 10. A non-transitory computer readable recording medium including a program for executing a method for optimizing a trained model in an electronic apparatus, wherein the method for optimizing a trained model includes: initializing a parameter matrix and a plurality of split variables of a trained model including a plurality of layers; calculating a new parameter matrix having a block-diagonal matrix for the plurality of split variables and the trained model to minimize a loss function for the trained model, a weight decay regularization term, and an objective function including a split regularization term defined by the parameter matrix and the plurality of split variables; vertically splitting the plurality of layers according to a group based on computed split parameters; and reconstructing the trained model using the calculated new parameter matrix as parameters of the vertically split layers, wherein the split regularization term comprises a group weight regularization term that suppresses an inter-group connection and activates only an intra-group connection, a disjoint group assignment that makes each group be orthogonal to each other, and a balanced group assignment that regularizes against a difference between a size of one group and a size of another group.

Assignees

Korea Advanced Inst Sci & Tech

Inventors

Classifications

G06N3/082Primary
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title
G06N3/098
Distributed learning, e.g. federated learning · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0495
Quantised networks; Sparse networks; Compressed networks · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

View patent family 64564215

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11580376B2 cover?: An electronic apparatus is provided. The electronic apparatus includes: a memory storing a trained model including a plurality of layers; and a processor initializing a parameter matrix and a plurality of split variables of a trained model, calculating a new parameter matrix having a block-diagonal matrix for the plurality of split variables and the trained model to minimize a loss function for…
Who is the assignee on this patent?: Korea Advanced Inst Sci & Tech
What technology area does this patent fall under?: Primary CPC classification G06N3/082. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 14 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Partitioned machine learning architecture

Hardware neural network conversion method, computing device, compiling method and neural network software and hardware collaboration system

System and method for designing efficient super resolution deep convolutional neural networks by cascade network training, cascade network trimming, and dilated convolutions

Learning a Vector Representation for Unique Identification Codes

Neuromorphic hardware for neuronal computation and non-neuronal computation

Frequently asked questions