Cascaded computing for convolutional neural networks
US-2019244100-A1 · Aug 8, 2019 · US
US10614798B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10614798-B2 |
| Application number | US-201716321097-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 27, 2017 |
| Priority date | Jul 29, 2016 |
| Publication date | Apr 7, 2020 |
| Grant date | Apr 7, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Aspects disclosed in the detailed description include memory compression in a deep neural network (DNN). To support a DNN application, a fully connected weight matrix associated with a hidden layer(s) of the DNN is divided into a plurality of weight blocks to generate a weight block matrix with a first number of rows and a second number of columns. A selected number of weight blocks are randomly designated as active weight blocks in each of the first number of rows and updated exclusively during DNN training. The weight block matrix is compressed to generate a sparsified weight block matrix including exclusively active weight blocks. The second number of columns is compressed to reduce memory footprint and computation power, while the first number of rows is retained to maintain accuracy of the DNN, thus providing the DNN in an efficient hardware implementation without sacrificing accuracy of the DNN application.
Opening claim text (preview).
What is claimed is: 1. A method for reducing memory requirement of a deep neural network (DNN) comprising: dividing a fully connected weight matrix associated with at least one hidden layer of the DNN into a plurality of weight blocks of a determined block size to generate a weight block matrix having a first number of rows and a second number of columns; randomly designating a selected number of active weight blocks in each of the first number of rows based on a target sparsity ratio; updating exclusively the selected number of active weight blocks in each of the first number of rows during a DNN training for a DNN application; compressing the weight block matrix to generate a sparsified weight block matrix comprising exclusively the selected number of active weight blocks in each of the first number of rows during the DNN training; storing the sparsified weight block matrix for the DNN application upon completion of the DNN training, wherein the sparsified weight block matrix is configured to be used directly for DNN classification; and performing at least one hierarchical coarse-grain sparsification (HCGS) iteration based on the sparsified weight block matrix of the DNN; wherein performing the at least one HCGS iteration comprises: for each active weight block in the sparsified weight block matrix: dividing the active weight block into a plurality of HCGS weight blocks of a determined HCGS block size smaller than the determined block size; and randomly designating a selected number of active HCGS weight blocks based on an HCGS target sparsity ratio; compressing the sparsified weight block matrix to generate a hierarchical sparsified weight block matrix comprising exclusively the selected number of active HCGS weight blocks in each active weight block in the sparsified weight block matrix; generating an HCGS weight block index identifying the selected number of active HCGS weight blocks in each active weight block in the sparsified weight block matrix; and storing the HCGS weight block index for the DNN application. 2. The method of claim 1 further comprising reducing precision of each weight in each of the selected number of active weight blocks from a floating-point precision to a fixed-point precision for the DNN classification. 3. The method of claim 1 further comprising: generating a binary connection coefficient identifying each non-zero weight and each zero weight in each of the selected number of active weight blocks in each of the first number of rows; and updating exclusively the non-zero weight identified by the binary connection coefficient during the DNN training. 4. The method of claim 1 further comprising: generating a weight block index identifying each of the selected number of active weight blocks in the sparsified weight block matrix; and storing the weight block index with the sparsified weight block matrix for the DNN application. 5. The method of claim 1 further comprising: supporting a keyword detection task based on the DNN comprising an input layer, an output layer, and two hidden layers each having five hundred twelve (512) neurons; dividing the fully connected weight matrix into sixty-four (64) 64-by-64 (64×64) weight blocks to generate the weight block matrix having eight rows and eight columns; randomly designating two active 64×64 weight blocks in each of the eight rows; updating the two active 64×64 weight blocks in each of the eight rows of the weight block matrix during the DNN training; and compressing the weight block matrix to generate the sparsified weight block matrix comprising exclusively the two active 64×64 weight blocks in each of the eight rows of the weight block matrix. 6. The method of claim 1 further comprising: supporting a speech recognition task based on the DNN comprising an input layer, an output layer, and four hidden layers each having one thousand twenty-four (1024) neurons; dividing the fully connected weight matrix into two hundred fifty-six (256) sixty-four-by-sixty-four (64×64) weight blocks to generate the weight block matrix having sixteen rows and sixteen columns; randomly designating four active 64×64 weight blocks in each of the sixteen rows; updating the four active 64×64 weight blocks in each of the sixteen rows during the DNN training; and compressing the weight block matrix to generate the sparsified weight block matrix comprising exclusively the four active 64×64 weight blocks in each of the sixteen rows of the weight block matrix. 7. The method of claim 6 further comprising: for each active 64×64 weight block in the sparsified weight block matrix: dividing the active 64×64 weight block into sixteen (16) 16-by-16 (16×16) weight blocks and organizing the 16 16×16 weight blocks into four rows and four columns; and randomly designating two active 16×16 weight blocks in each of the four rows; and compressing the sparsified weight block matrix to generate a hierarchical sparsified weight block matrix comprising exclusively the active 16×16 weight blocks. 8. A non-transitory computer-readable medium comprising software with instructions configured to: divide a fully connected weight matrix associated with at least one hidden layer of a deep neural network (DNN) into a plurality of weight blocks of a determined block size to generate a weight block matrix having a first number of rows and a second number of columns; randomly designate a selected number of active weight blocks in each of the first number of rows based on a target sparsity ratio; update exclusively the selected number of active weight blocks in each of the first number of rows during a DNN training for a DNN application; compress the weight block matrix to generate a sparsified weight block matrix comprising exclusively the selected number of active weight blocks in each of the first number of rows during the DNN training; store the sparsified weight block matrix for the DNN application upon completion of the DNN training, wherein the sparsified weight block matrix is configured to be used directly for DNN classification; and perform at least one hierarchical coarse-grain sparsification (HCGS) iteration based on the sparsified weight block matrix, comprising: for each active weight block in the sparsified weight block matrix: dividing the active weight block into a plurality of HCGS weight blocks of a determined HCGS block size smaller than the determined block size; and randomly designating a selected number of active HCGS weight blocks based on an HCGS target sparsity ratio; compressing the sparsified weight block matrix to generate a hierarchical sparsified weight block matrix comprising exclusively the selected number of active HCGS weight blocks in each active weight block in the sparsified weight block matrix; generating an HCGS weight block index identifying the selected number of active HCGS weight blocks in each active weight block in the sparsified weight block matrix; and storing the HCGS weight block index for the DNN application. 9. The non-transitory computer-readable medium of claim 8 wherein the software with instructions is further configured to reduce precision of each weight in each of the selected number of active weight blocks from a floating-point precision to a fixed-point precision for the DNN classification. 10. The non-transitory computer-readable medium of claim 8 wherein the software with instructions is further configured to: generate a binary connection coefficient identifying each non-zero weight and each zero weight in each of the selected number of active weight blocks in each of the first number of rows; and update exclusively the non-zero weight identified by the binary connection
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Execution procedure of a spoken command · CPC title
Memory allocation or algorithm optimisation to reduce hardware requirements · CPC title
using artificial neural networks · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.