What technology area does this patent fall under?

Primary CPC classification G06N3/082. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 27 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method and device for deep neural network compression

US12314857B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12314857-B2
Application number	US-202117241572-A
Country	US
Kind code	B2
Filing date	Apr 27, 2021
Priority date	May 15, 2020
Publication date	May 27, 2025
Grant date	May 27, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for deep neural network compression is provided. The method includes: using at least one weight of a deep neural network (DNN), setting a value of a P parameter, and combining every P weights in groups, and perform branch pruning and retraining, so that only one of each group has a non-zero weight, and the remaining weights are 0, wherein the remaining weights are evenly divided into branches to adjust a compression rate of the DNN and to adjust a reduction rate of the DNN.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for deep neural network compression, comprising: obtaining, by a processor, at least one weight of a deep neural network (DNN), setting a value of a P parameter, and combining every P weights in groups, wherein P is a number, the at least one weight is placed between two network layers corresponding to an input layer and an output layer adjacent to each other in the deep neural network, the processor comprising input layer storage units having nodes, wherein the processor further comprises a multiplexer, wherein input values of the nodes of the input layer are multiplied at the multiplexer by respective weights in one of the groups to generate first output values, the first output values accumulated into a final output value which is equal to an output value of one of plural nodes of the output layer; setting, by the processor, a loop parameter L so that a branch pruning and retraining is performed on each group of P weights from the first loop to the Lth loop, setting a threshold T corresponding to each loop, wherein the threshold T increases gradually from a first threshold T1 to an L threshold TL, the first threshold T1 is used at the beginning of the first loop, and the L threshold TL is used in the Lth loop; for each loop, setting, by the processor, the weights less than the threshold T to 0 and retaining the weights greater than the threshold T when obtaining the weights of the same group in the same loop according to the threshold T, wherein when all of the weights of the same group in the same loop are less than the threshold T, only the largest weight is retained, and remaining weights are set to 0; and performing, by the processor, the branch pruning and retraining until the Lth loop, so that only one weight of each group is non-zero and the remaining weights are 0. 2. The method for deep neural network compression as claimed in claim 1 , wherein the DNN is a fully connected structure in which each node of the input layer is connected to each node of the output layer. 3. The method for deep neural network compression as claimed in claim 1 , wherein the DNN is a combination of multiple network layers, each group of network layers of the DNN is adjacently connected by the input layer of the same group of network layers and the output layer of the same group of network layers, and the output layer of the previous group of network layers is the input layer of the next group of network layers. 4. The method for deep neural network compression as claimed in claim 3 , wherein packet formats of the weight have 16-bit, 12-bit and 8-bit formats. 5. The method for deep neural network compression as claimed in claim 4 , wherein a binary fixed-point number extraction calculation is performed on the output value of the nodes of the output layer, wherein the number of digits before the decimal point of the output value of the nodes of the output layer plus the number of digits after the decimal point of the output value of the nodes of the output layer is 16 bits. 6. The method for deep neural network compression as claimed in claim 5 , wherein a binary fixed-point number extraction calculation is performed on the output value of the nodes of the output layer, wherein the number of digits before the decimal point of the output value plus at least 1 bit increased by adjustment, and plus the number of digits after the decimal point of the output value of the nodes of the output layer is 16 bits. 7. The method for deep neural network compression as claimed in claim 5 , wherein among the bits of the packet formats of the weight, at least one high bit counted from the highest bit is an address index, which is used to store an address corresponding to the input values of nodes of the input layer; wherein the method further comprises: inputting, by a select terminal of the multiplexer of the processor, the address index; inputting, by an input of the multiplexer of the processor, the weights of the same group; selecting, by the multiplexer of the processor, the input values of nodes of the input layer corresponding to the address index; and outputting, by an output of the multiplexer of the processor, the input values of the nodes of the input layer corresponding to the address index. 8. The method for deep neural network compression as claimed in claim 7 , wherein among the bits of the packet formats of the weight, remaining bits are bits for storing the weights except for the address index represented by at least one high bit counted from the highest bit. 9. The method for deep neural network compression as claimed in claim 7 , wherein an address generator of the processor connected with a weight memory and the input layer storage units is used as an address counter, and the bits of the packet formats of the weight are stored in the weight memory. 10. A device for deep neural network compression, comprising: a processor, configured to execute the following tasks: obtaining at least one weight of a deep neural network (DNN), setting a value of a P parameter, and combining every P weights in groups, wherein P is a number, the at least one weight is placed between two network layers corresponding to an input layer and an output layer adjacent to each other in the deep neural network, the processor comprising input layer storage units having nodes, wherein the processor further comprises a multiplexer, wherein input values of the nodes of the input layer are multiplied at the multiplexer by respective weights in one of the groups to generate first output values; the first output values accumulated into a final output value which is equal to an output value of one of plural nodes of the output layer; setting a loop parameter L so that a branch pruning and retraining is performed on each group of P weights from the first loop to the Lth loop, setting a threshold T corresponding to each loop, wherein the threshold T increases gradually from a first threshold T1 to an L threshold TL, the first threshold T1 is used at the beginning of the first loop, and the L threshold TL is used in the Lth loop; for each loop, setting the weights less than the threshold T to 0 and retaining the weights greater than the threshold T when obtaining the weights of the same group in the same loop according to the threshold T, wherein when all of the weights of the same group in the same loop are less than the threshold T, only the largest weight is retained, and remaining weights are set to 0; and performing the branch pruning and retraining until the Lth loop of the last loop, so that only one weight of each group is non-zero and the remaining weights are 0. 11. The device for deep neural network compression as claimed in claim 10 , wherein the DNN is a fully connected structure in which each node of the input layer is connected to each node of the output layer. 12. The device for deep neural network compression as claimed in claim 10 , wherein the DNN is a combination of multiple network layers, each group of network layers of the DNN is adjacently connected by the input layer of the same group of network layers and the output layer of the same group of network layers, and the output layer of the previous group of network layers is the input layer of the next group of network layers. 13. The device for deep neural network compression as claimed in claim 12 , wherein packet formats of the weight have 16-bit, 12-bit and 8-bit formats. 14. The device for deep neural network compression as claimed in claim 13 , wherein a binary fixed-point number extraction calculation is performed on the output value of the nodes of the output layer, wherein the

Assignees

Acer Inc

Inventors

Classifications

G06N3/0499
Feedforward networks · CPC title
G06N3/0495
Quantised networks; Sparse networks; Compressed networks · CPC title
G06N3/082Primary
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title
G06N3/04
Architecture, e.g. interconnection topology · CPC title
G06F7/491
Computations with decimal numbers {radix 12 or 20. (G06F7/4824 takes precedence)} · CPC title

Patent family

Related publications grouped by family.

View patent family 75825580

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12314857B2 cover?: A method for deep neural network compression is provided. The method includes: using at least one weight of a deep neural network (DNN), setting a value of a P parameter, and combining every P weights in groups, and perform branch pruning and retraining, so that only one of each group has a non-zero weight, and the remaining weights are 0, wherein the remaining weights are evenly divided into b…
Who is the assignee on this patent?: Acer Inc
What technology area does this patent fall under?: Primary CPC classification G06N3/082. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 27 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Electronic apparatus and control method thereof

Sparsity constraints and knowledge distillation based learning of sparser and compressed neural networks

Automatic thresholds for neural network pruning and retraining

Multi-level state detecting system and method

Model compression and fine-tuning

Frequently asked questions