What technology area does this patent fall under?

Primary CPC classification G10L15/16. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Memory compression in a deep neural network

US10614798B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10614798-B2
Application number	US-201716321097-A
Country	US
Kind code	B2
Filing date	Jul 27, 2017
Priority date	Jul 29, 2016
Publication date	Apr 7, 2020
Grant date	Apr 7, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Aspects disclosed in the detailed description include memory compression in a deep neural network (DNN). To support a DNN application, a fully connected weight matrix associated with a hidden layer(s) of the DNN is divided into a plurality of weight blocks to generate a weight block matrix with a first number of rows and a second number of columns. A selected number of weight blocks are randomly designated as active weight blocks in each of the first number of rows and updated exclusively during DNN training. The weight block matrix is compressed to generate a sparsified weight block matrix including exclusively active weight blocks. The second number of columns is compressed to reduce memory footprint and computation power, while the first number of rows is retained to maintain accuracy of the DNN, thus providing the DNN in an efficient hardware implementation without sacrificing accuracy of the DNN application.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for reducing memory requirement of a deep neural network (DNN) comprising: dividing a fully connected weight matrix associated with at least one hidden layer of the DNN into a plurality of weight blocks of a determined block size to generate a weight block matrix having a first number of rows and a second number of columns; randomly designating a selected number of active weight blocks in each of the first number of rows based on a target sparsity ratio; updating exclusively the selected number of active weight blocks in each of the first number of rows during a DNN training for a DNN application; compressing the weight block matrix to generate a sparsified weight block matrix comprising exclusively the selected number of active weight blocks in each of the first number of rows during the DNN training; storing the sparsified weight block matrix for the DNN application upon completion of the DNN training, wherein the sparsified weight block matrix is configured to be used directly for DNN classification; and performing at least one hierarchical coarse-grain sparsification (HCGS) iteration based on the sparsified weight block matrix of the DNN; wherein performing the at least one HCGS iteration comprises: for each active weight block in the sparsified weight block matrix: dividing the active weight block into a plurality of HCGS weight blocks of a determined HCGS block size smaller than the determined block size; and randomly designating a selected number of active HCGS weight blocks based on an HCGS target sparsity ratio; compressing the sparsified weight block matrix to generate a hierarchical sparsified weight block matrix comprising exclusively the selected number of active HCGS weight blocks in each active weight block in the sparsified weight block matrix; generating an HCGS weight block index identifying the selected number of active HCGS weight blocks in each active weight block in the sparsified weight block matrix; and storing the HCGS weight block index for the DNN application. 2. The method of claim 1 further comprising reducing precision of each weight in each of the selected number of active weight blocks from a floating-point precision to a fixed-point precision for the DNN classification. 3. The method of claim 1 further comprising: generating a binary connection coefficient identifying each non-zero weight and each zero weight in each of the selected number of active weight blocks in each of the first number of rows; and updating exclusively the non-zero weight identified by the binary connection coefficient during the DNN training. 4. The method of claim 1 further comprising: generating a weight block index identifying each of the selected number of active weight blocks in the sparsified weight block matrix; and storing the weight block index with the sparsified weight block matrix for the DNN application. 5. The method of claim 1 further comprising: supporting a keyword detection task based on the DNN comprising an input layer, an output layer, and two hidden layers each having five hundred twelve (512) neurons; dividing the fully connected weight matrix into sixty-four (64) 64-by-64 (64×64) weight blocks to generate the weight block matrix having eight rows and eight columns; randomly designating two active 64×64 weight blocks in each of the eight rows; updating the two active 64×64 weight blocks in each of the eight rows of the weight block matrix during the DNN training; and compressing the weight block matrix to generate the sparsified weight block matrix comprising exclusively the two active 64×64 weight blocks in each of the eight rows of the weight block matrix. 6. The method of claim 1 further comprising: supporting a speech recognition task based on the DNN comprising an input layer, an output layer, and four hidden layers each having one thousand twenty-four (1024) neurons; dividing the fully connected weight matrix into two hundred fifty-six (256) sixty-four-by-sixty-four (64×64) weight blocks to generate the weight block matrix having sixteen rows and sixteen columns; randomly designating four active 64×64 weight blocks in each of the sixteen rows; updating the four active 64×64 weight blocks in each of the sixteen rows during the DNN training; and compressing the weight block matrix to generate the sparsified weight block matrix comprising exclusively the four active 64×64 weight blocks in each of the sixteen rows of the weight block matrix. 7. The method of claim 6 further comprising: for each active 64×64 weight block in the sparsified weight block matrix: dividing the active 64×64 weight block into sixteen (16) 16-by-16 (16×16) weight blocks and organizing the 16 16×16 weight blocks into four rows and four columns; and randomly designating two active 16×16 weight blocks in each of the four rows; and compressing the sparsified weight block matrix to generate a hierarchical sparsified weight block matrix comprising exclusively the active 16×16 weight blocks. 8. A non-transitory computer-readable medium comprising software with instructions configured to: divide a fully connected weight matrix associated with at least one hidden layer of a deep neural network (DNN) into a plurality of weight blocks of a determined block size to generate a weight block matrix having a first number of rows and a second number of columns; randomly designate a selected number of active weight blocks in each of the first number of rows based on a target sparsity ratio; update exclusively the selected number of active weight blocks in each of the first number of rows during a DNN training for a DNN application; compress the weight block matrix to generate a sparsified weight block matrix comprising exclusively the selected number of active weight blocks in each of the first number of rows during the DNN training; store the sparsified weight block matrix for the DNN application upon completion of the DNN training, wherein the sparsified weight block matrix is configured to be used directly for DNN classification; and perform at least one hierarchical coarse-grain sparsification (HCGS) iteration based on the sparsified weight block matrix, comprising: for each active weight block in the sparsified weight block matrix: dividing the active weight block into a plurality of HCGS weight blocks of a determined HCGS block size smaller than the determined block size; and randomly designating a selected number of active HCGS weight blocks based on an HCGS target sparsity ratio; compressing the sparsified weight block matrix to generate a hierarchical sparsified weight block matrix comprising exclusively the selected number of active HCGS weight blocks in each active weight block in the sparsified weight block matrix; generating an HCGS weight block index identifying the selected number of active HCGS weight blocks in each active weight block in the sparsified weight block matrix; and storing the HCGS weight block index for the DNN application. 9. The non-transitory computer-readable medium of claim 8 wherein the software with instructions is further configured to reduce precision of each weight in each of the selected number of active weight blocks from a floating-point precision to a fixed-point precision for the DNN classification. 10. The non-transitory computer-readable medium of claim 8 wherein the software with instructions is further configured to: generate a binary connection coefficient identifying each non-zero weight and each zero weight in each of the selected number of active weight blocks in each of the first number of rows; and update exclusively the non-zero weight identified by the binary connection

Assignees

Univ Arizona State

Inventors

Classifications

G10L15/22
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
G10L2015/223
Execution procedure of a spoken command · CPC title
G10L15/285
Memory allocation or algorithm optimisation to reduce hardware requirements · CPC title
G10L15/16Primary
using artificial neural networks · CPC title
G06N3/0472
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 61016789

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10614798B2 cover?: Aspects disclosed in the detailed description include memory compression in a deep neural network (DNN). To support a DNN application, a fully connected weight matrix associated with a hidden layer(s) of the DNN is divided into a plurality of weight blocks to generate a weight block matrix with a first number of rows and a second number of columns. A selected number of weight blocks are randoml…
Who is the assignee on this patent?: Univ Arizona State
What technology area does this patent fall under?: Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).