What technology area does this patent fall under?

Primary CPC classification G06N3/082. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 16 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for block-sparse recurrent neural networks

US11651223B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11651223-B2
Application number	US-201816151886-A
Country	US
Kind code	B2
Filing date	Oct 4, 2018
Priority date	Oct 27, 2017
Publication date	May 16, 2023
Grant date	May 16, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described herein are systems and methods to prune deep neural network models in reducing the overall memory and compute requirements of these models. It is demonstrated that using block pruning and group lasso combined with pruning during training, block-sparse recurrent neural networks (RNNs) may be built as accurate as dense baseline models. Two different approaches are disclosed to induce block sparsity in neural network models: pruning blocks of weights in a layer and using group lasso regularization to create blocks of weights with zeros. Using these techniques, it is demonstrated that block-sparse RNNs with high sparsity can be created with small loss in accuracy. Block-sparse RNNs eliminate overheads related to data storage and irregular memory accesses while increasing hardware efficiency compared to unstructured sparsity.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for computer learning, the method comprising: dividing at least one weight matrix of a neural network model into a plurality of blocks with each block comprising a plurality of elements, the block comprising a block size associated with data-path size of one or more processors running the neural network model; and pruning a neural network model in a training process to reduce parameter numbers of the neural network model for reduced memory and computation requirements, pruning the neural network model comprising at least one from the following steps: applying block pruning by setting all weights in a block to zeros in response to a representative weight representing the block being below a threshold; or applying a group lasso regularization for each block by adding a loss term proportional to 2 norm of the block to drive one or more blocks towards zeros. 2. The computer-implemented method of claim 1 wherein the threshold is initially set as zero for an initial training iteration or iterations to allow the weights in the at least one weight matrix to progress. 3. The computer-implemented method of claim 2 wherein the threshold has a start slope starting at a start iteration after the initial training iteration or iterations. 4. The computer-implemented method of claim 3 the start slope is determined by one or more hyper-parameters including the number of elements in each block. 5. The computer-implemented method of claim 3 wherein the at least one weight matrix comprises one or more recurrent weight matrices for one or more recurrent layers within the neural network model, and one or more non-recurrent weight matrices for one or more fully connected layer within the neural network model, and wherein one or more hyper-parameters are the same for each type of weight matrices. 6. The computer-implemented method of claim 3 wherein the threshold has a ramp slope starting at a ramp iteration, which occurs after the start iteration. 7. The computer-implemented method of claim 6 wherein the ramp slope is a multiple of the start slope. 8. The computer-implemented method of claim 1 wherein the proportion to the 2 norm of the block is constant during the training process. 9. The computer-implemented method of claim 1 wherein the group lasso regularization is applied in combination with block pruning. 10. A system for computer learning, the system comprising: a neural network model to implement one or more computer learning tasks, the neural network model network comprises one or more recurrent layers and one or more non-recurrent layers; and one or more processors configured to train the neural network model in a training process to reduce parameter numbers of the neural network model for reduced memory and computation requirements, the training process comprising: dividing at least one weight matrix of the neural network model into a plurality of blocks of a same block size with each block comprising a plurality of elements, the block size is associated with structure of the one or more processors running the neural network model; pruning the neural network model by at least one of the following: applying block pruning by setting all weights in a block to zeros in response to a representative weight representing the block being below a threshold; or applying a weight regularization for each block by adding a loss term proportional to a norm of the block to drive one or more blocks towards zeros; and stopping pruning the neural network model when a desired percentage of block sparsity or a predetermined percentage of a total training iterations is reached. 11. The system of claim 10 wherein the norm of the block is a 2 norm of the block, the weight regularization is a group lasso regularization. 12. The system of claim 11 wherein the proportion to the 2 norm of the block is constant during the training of the neural network model. 13. The system of claim 10 wherein the threshold is initially set as zero for an initial set of one or more training iterations to allow weights in the at least one weight matrix to progress, the threshold then monotonically grows to prune the at least one weight matrix. 14. The system of claim 13 wherein the threshold monotonically grows with a slope associated with the block size. 15. A non-transitory computer-readable medium or media comprising one or more sequences of instructions which, when executed by one or more processors, causes the steps to be performed comprising: dividing each of a plurality of weight matrices in a neural network model into a plurality of blocks with each block comprising a plurality of elements and comprising a block size that is associated with data-path size of one or more processors running the neural network model; and pruning the neural network model to reduce parameter numbers of the neural network model for reduced memory and computation requirements by implementing at least one of the following: applying block pruning by setting all weights in a block to zeros in response to a representative weight representing the block being below a threshold; or applying a weight regularization for each block by adding a loss term proportional to a norm of the block to drive one or more blocks towards zeros. 16. The non-transitory computer-readable medium or media of claim 15 wherein the threshold is initially set as zero for an initial training iteration or training iterations to allow weights in at least one weight matrix to grow, the threshold then monotonically grows to prune the at least one weight matrix. 17. The non-transitory computer-readable medium or media of claim 16 wherein the threshold monotonically grows with a slope associated with the block size. 18. The non-transitory computer-readable medium or media of claim 15 wherein the norm of the block is a 2 norm of the block, and wherein the weight regularization is a group lasso regularization. 19. The non-transitory computer-readable medium or media of claim 18 wherein the group lasso regularization is applied in combination with block pruning. 20. The non-transitory computer-readable medium or media of claim 15 wherein the steps further comprising: stopping pruning the neural network model when a desired percentage of block sparsity or a percentage of total training iterations is reached.

Assignees

Baidu Usa Llc

Inventors

Classifications

G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N3/0495
Quantised networks; Sparse networks; Compressed networks · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/082Primary
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title
G06N3/044Primary
Recurrent networks, e.g. Hopfield networks · CPC title

Patent family

Related publications grouped by family.

View patent family 66244074

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11651223B2 cover?: Described herein are systems and methods to prune deep neural network models in reducing the overall memory and compute requirements of these models. It is demonstrated that using block pruning and group lasso combined with pruning during training, block-sparse recurrent neural networks (RNNs) may be built as accurate as dense baseline models. Two different approaches are disclosed to induce bl…
Who is the assignee on this patent?: Baidu Usa Llc
What technology area does this patent fall under?: Primary CPC classification G06N3/082. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 16 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Efficient neural networks with elaborate matrix structures in machine learning environments

Memory compression in a deep neural network

Augmenting neural networks with sparsely-accessed external memory

Tri-configuration neural network unit

Accelerated tr-l-bfgs algorithm for neural network

Automatic tuning of artificial neural networks

Frequently asked questions