Efficient neural networks with elaborate matrix structures in machine learning environments
US-2020234137-A1 · Jul 23, 2020 · US
US11651223B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11651223-B2 |
| Application number | US-201816151886-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 4, 2018 |
| Priority date | Oct 27, 2017 |
| Publication date | May 16, 2023 |
| Grant date | May 16, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Described herein are systems and methods to prune deep neural network models in reducing the overall memory and compute requirements of these models. It is demonstrated that using block pruning and group lasso combined with pruning during training, block-sparse recurrent neural networks (RNNs) may be built as accurate as dense baseline models. Two different approaches are disclosed to induce block sparsity in neural network models: pruning blocks of weights in a layer and using group lasso regularization to create blocks of weights with zeros. Using these techniques, it is demonstrated that block-sparse RNNs with high sparsity can be created with small loss in accuracy. Block-sparse RNNs eliminate overheads related to data storage and irregular memory accesses while increasing hardware efficiency compared to unstructured sparsity.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for computer learning, the method comprising: dividing at least one weight matrix of a neural network model into a plurality of blocks with each block comprising a plurality of elements, the block comprising a block size associated with data-path size of one or more processors running the neural network model; and pruning a neural network model in a training process to reduce parameter numbers of the neural network model for reduced memory and computation requirements, pruning the neural network model comprising at least one from the following steps: applying block pruning by setting all weights in a block to zeros in response to a representative weight representing the block being below a threshold; or applying a group lasso regularization for each block by adding a loss term proportional to 2 norm of the block to drive one or more blocks towards zeros. 2. The computer-implemented method of claim 1 wherein the threshold is initially set as zero for an initial training iteration or iterations to allow the weights in the at least one weight matrix to progress. 3. The computer-implemented method of claim 2 wherein the threshold has a start slope starting at a start iteration after the initial training iteration or iterations. 4. The computer-implemented method of claim 3 the start slope is determined by one or more hyper-parameters including the number of elements in each block. 5. The computer-implemented method of claim 3 wherein the at least one weight matrix comprises one or more recurrent weight matrices for one or more recurrent layers within the neural network model, and one or more non-recurrent weight matrices for one or more fully connected layer within the neural network model, and wherein one or more hyper-parameters are the same for each type of weight matrices. 6. The computer-implemented method of claim 3 wherein the threshold has a ramp slope starting at a ramp iteration, which occurs after the start iteration. 7. The computer-implemented method of claim 6 wherein the ramp slope is a multiple of the start slope. 8. The computer-implemented method of claim 1 wherein the proportion to the 2 norm of the block is constant during the training process. 9. The computer-implemented method of claim 1 wherein the group lasso regularization is applied in combination with block pruning. 10. A system for computer learning, the system comprising: a neural network model to implement one or more computer learning tasks, the neural network model network comprises one or more recurrent layers and one or more non-recurrent layers; and one or more processors configured to train the neural network model in a training process to reduce parameter numbers of the neural network model for reduced memory and computation requirements, the training process comprising: dividing at least one weight matrix of the neural network model into a plurality of blocks of a same block size with each block comprising a plurality of elements, the block size is associated with structure of the one or more processors running the neural network model; pruning the neural network model by at least one of the following: applying block pruning by setting all weights in a block to zeros in response to a representative weight representing the block being below a threshold; or applying a weight regularization for each block by adding a loss term proportional to a norm of the block to drive one or more blocks towards zeros; and stopping pruning the neural network model when a desired percentage of block sparsity or a predetermined percentage of a total training iterations is reached. 11. The system of claim 10 wherein the norm of the block is a 2 norm of the block, the weight regularization is a group lasso regularization. 12. The system of claim 11 wherein the proportion to the 2 norm of the block is constant during the training of the neural network model. 13. The system of claim 10 wherein the threshold is initially set as zero for an initial set of one or more training iterations to allow weights in the at least one weight matrix to progress, the threshold then monotonically grows to prune the at least one weight matrix. 14. The system of claim 13 wherein the threshold monotonically grows with a slope associated with the block size. 15. A non-transitory computer-readable medium or media comprising one or more sequences of instructions which, when executed by one or more processors, causes the steps to be performed comprising: dividing each of a plurality of weight matrices in a neural network model into a plurality of blocks with each block comprising a plurality of elements and comprising a block size that is associated with data-path size of one or more processors running the neural network model; and pruning the neural network model to reduce parameter numbers of the neural network model for reduced memory and computation requirements by implementing at least one of the following: applying block pruning by setting all weights in a block to zeros in response to a representative weight representing the block being below a threshold; or applying a weight regularization for each block by adding a loss term proportional to a norm of the block to drive one or more blocks towards zeros. 16. The non-transitory computer-readable medium or media of claim 15 wherein the threshold is initially set as zero for an initial training iteration or training iterations to allow weights in at least one weight matrix to grow, the threshold then monotonically grows to prune the at least one weight matrix. 17. The non-transitory computer-readable medium or media of claim 16 wherein the threshold monotonically grows with a slope associated with the block size. 18. The non-transitory computer-readable medium or media of claim 15 wherein the norm of the block is a 2 norm of the block, and wherein the weight regularization is a group lasso regularization. 19. The non-transitory computer-readable medium or media of claim 18 wherein the group lasso regularization is applied in combination with block pruning. 20. The non-transitory computer-readable medium or media of claim 15 wherein the steps further comprising: stopping pruning the neural network model when a desired percentage of block sparsity or a percentage of total training iterations is reached.
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Supervised learning · CPC title
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.