Method and device for deep neural network compression

US12314857B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12314857-B2
Application numberUS-202117241572-A
CountryUS
Kind codeB2
Filing dateApr 27, 2021
Priority dateMay 15, 2020
Publication dateMay 27, 2025
Grant dateMay 27, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for deep neural network compression is provided. The method includes: using at least one weight of a deep neural network (DNN), setting a value of a P parameter, and combining every P weights in groups, and perform branch pruning and retraining, so that only one of each group has a non-zero weight, and the remaining weights are 0, wherein the remaining weights are evenly divided into branches to adjust a compression rate of the DNN and to adjust a reduction rate of the DNN.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for deep neural network compression, comprising: obtaining, by a processor, at least one weight of a deep neural network (DNN), setting a value of a P parameter, and combining every P weights in groups, wherein P is a number, the at least one weight is placed between two network layers corresponding to an input layer and an output layer adjacent to each other in the deep neural network, the processor comprising input layer storage units having nodes, wherein the processor further comprises a multiplexer, wherein input values of the nodes of the input layer are multiplied at the multiplexer by respective weights in one of the groups to generate first output values, the first output values accumulated into a final output value which is equal to an output value of one of plural nodes of the output layer; setting, by the processor, a loop parameter L so that a branch pruning and retraining is performed on each group of P weights from the first loop to the Lth loop, setting a threshold T corresponding to each loop, wherein the threshold T increases gradually from a first threshold T1 to an L threshold TL, the first threshold T1 is used at the beginning of the first loop, and the L threshold TL is used in the Lth loop; for each loop, setting, by the processor, the weights less than the threshold T to 0 and retaining the weights greater than the threshold T when obtaining the weights of the same group in the same loop according to the threshold T, wherein when all of the weights of the same group in the same loop are less than the threshold T, only the largest weight is retained, and remaining weights are set to 0; and performing, by the processor, the branch pruning and retraining until the Lth loop, so that only one weight of each group is non-zero and the remaining weights are 0. 2. The method for deep neural network compression as claimed in claim 1 , wherein the DNN is a fully connected structure in which each node of the input layer is connected to each node of the output layer. 3. The method for deep neural network compression as claimed in claim 1 , wherein the DNN is a combination of multiple network layers, each group of network layers of the DNN is adjacently connected by the input layer of the same group of network layers and the output layer of the same group of network layers, and the output layer of the previous group of network layers is the input layer of the next group of network layers. 4. The method for deep neural network compression as claimed in claim 3 , wherein packet formats of the weight have 16-bit, 12-bit and 8-bit formats. 5. The method for deep neural network compression as claimed in claim 4 , wherein a binary fixed-point number extraction calculation is performed on the output value of the nodes of the output layer, wherein the number of digits before the decimal point of the output value of the nodes of the output layer plus the number of digits after the decimal point of the output value of the nodes of the output layer is 16 bits. 6. The method for deep neural network compression as claimed in claim 5 , wherein a binary fixed-point number extraction calculation is performed on the output value of the nodes of the output layer, wherein the number of digits before the decimal point of the output value plus at least 1 bit increased by adjustment, and plus the number of digits after the decimal point of the output value of the nodes of the output layer is 16 bits. 7. The method for deep neural network compression as claimed in claim 5 , wherein among the bits of the packet formats of the weight, at least one high bit counted from the highest bit is an address index, which is used to store an address corresponding to the input values of nodes of the input layer; wherein the method further comprises: inputting, by a select terminal of the multiplexer of the processor, the address index; inputting, by an input of the multiplexer of the processor, the weights of the same group; selecting, by the multiplexer of the processor, the input values of nodes of the input layer corresponding to the address index; and outputting, by an output of the multiplexer of the processor, the input values of the nodes of the input layer corresponding to the address index. 8. The method for deep neural network compression as claimed in claim 7 , wherein among the bits of the packet formats of the weight, remaining bits are bits for storing the weights except for the address index represented by at least one high bit counted from the highest bit. 9. The method for deep neural network compression as claimed in claim 7 , wherein an address generator of the processor connected with a weight memory and the input layer storage units is used as an address counter, and the bits of the packet formats of the weight are stored in the weight memory. 10. A device for deep neural network compression, comprising: a processor, configured to execute the following tasks: obtaining at least one weight of a deep neural network (DNN), setting a value of a P parameter, and combining every P weights in groups, wherein P is a number, the at least one weight is placed between two network layers corresponding to an input layer and an output layer adjacent to each other in the deep neural network, the processor comprising input layer storage units having nodes, wherein the processor further comprises a multiplexer, wherein input values of the nodes of the input layer are multiplied at the multiplexer by respective weights in one of the groups to generate first output values; the first output values accumulated into a final output value which is equal to an output value of one of plural nodes of the output layer; setting a loop parameter L so that a branch pruning and retraining is performed on each group of P weights from the first loop to the Lth loop, setting a threshold T corresponding to each loop, wherein the threshold T increases gradually from a first threshold T1 to an L threshold TL, the first threshold T1 is used at the beginning of the first loop, and the L threshold TL is used in the Lth loop; for each loop, setting the weights less than the threshold T to 0 and retaining the weights greater than the threshold T when obtaining the weights of the same group in the same loop according to the threshold T, wherein when all of the weights of the same group in the same loop are less than the threshold T, only the largest weight is retained, and remaining weights are set to 0; and performing the branch pruning and retraining until the Lth loop of the last loop, so that only one weight of each group is non-zero and the remaining weights are 0. 11. The device for deep neural network compression as claimed in claim 10 , wherein the DNN is a fully connected structure in which each node of the input layer is connected to each node of the output layer. 12. The device for deep neural network compression as claimed in claim 10 , wherein the DNN is a combination of multiple network layers, each group of network layers of the DNN is adjacently connected by the input layer of the same group of network layers and the output layer of the same group of network layers, and the output layer of the previous group of network layers is the input layer of the next group of network layers. 13. The device for deep neural network compression as claimed in claim 12 , wherein packet formats of the weight have 16-bit, 12-bit and 8-bit formats. 14. The device for deep neural network compression as claimed in claim 13 , wherein a binary fixed-point number extraction calculation is performed on the output value of the nodes of the output layer, wherein the

Assignees

Inventors

Classifications

  • Feedforward networks · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • G06N3/082Primary

    modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

  • Architecture, e.g. interconnection topology · CPC title

  • Computations with decimal numbers {radix 12 or 20. (G06F7/4824 takes precedence)} · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12314857B2 cover?
A method for deep neural network compression is provided. The method includes: using at least one weight of a deep neural network (DNN), setting a value of a P parameter, and combining every P weights in groups, and perform branch pruning and retraining, so that only one of each group has a non-zero weight, and the remaining weights are 0, wherein the remaining weights are evenly divided into b…
Who is the assignee on this patent?
Acer Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/082. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 27 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).