Signal Processing System and Method
US-2020110989-A1 · Apr 9, 2020 · US
US11816563B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11816563-B2 |
| Application number | US-201916409487-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 10, 2019 |
| Priority date | Jan 17, 2019 |
| Publication date | Nov 14, 2023 |
| Grant date | Nov 14, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method of storing a sparse weight matrix for a trained artificial neural network in a circuit including a series of clusters. The method includes partitioning the sparse weight matrix into at least one first sub-block and at least one second sub-block. The first sub-block includes only zero-value weights and the second sub-block includes non-zero value weights. The method also includes assigning the non-zero value weights in the at least one second sub-block to at least one cluster of the series of clusters of the circuit. The circuit is configured to perform matrix-vector-multiplication (MVM) between the non-zero value weights of the at least one second sub-block and an input vector during an inference process utilizing the artificial neural network. The sub-blocks containing all zero elements are power gated, thereby reducing overall energy consumption for inference.
Opening claim text (preview).
What is claimed is: 1. A method of storing a sparse weight matrix for a trained artificial neural network in a circuit comprising a plurality of clusters, the method comprising: partitioning the sparse weight matrix, based on an arrangement of zero-value weights and non-zero value weights in the sparse weight matrix, into at least one first sub-block and at least one second sub-block, the at least one first sub-block comprising a plurality of weight values, the plurality of weight values of the at least one first sub-block containing only zero-value weights and the at least one second sub-block comprising non-zero value weights; and assigning the non-zero value weights in the at least one second sub-block to at least one cluster of the plurality of clusters of the circuit, wherein the circuit is configured to perform matrix-vector-multiplication (MVM) between the non-zero value weights of the at least one second sub-block and an input vector. 2. The method of claim 1 , further comprising identifying clusters of the plurality of clusters that were not assigned at least one non-zero value weight during the assigning the non-zero value weights. 3. The method of claim 2 , further comprising completely cutting off power to the clusters that were not assigned at least one non-zero value weight. 4. The method of claim 1 , wherein each cluster of the plurality of clusters comprises an array of memristors. 5. The method of claim 4 , wherein the memristors are selected from the group consisting of resistive random access memory (RRAM), conductive-bridging random access memory (CBRAM), phase-change memory (PCM), a ferroelectric field effect transistor (FerroFET), spin-transfer torque random access memory (STT RAM), and combinations thereof. 6. The method of claim 4 , wherein the assigning the non-zero value weights comprises setting, utilizing a plurality of selectors connected in series to the memristors, a resistance of each of the memristors. 7. The method of claim 1 , wherein: the sparse weight matrix has a size of 512×512, the at least one first sub-block has a size of 256×256, and the at least one second sub-block has a size selected from the group consisting of 128×128, 64×64, and 32×32. 8. The method of claim 1 , wherein the partitioning the sparse weight matrix comprises comparing, recursively, a size of the at least one second sub-block to a size of a smallest cluster of the plurality of clusters. 9. The method of claim 8 , wherein, if the size of the at least one second sub-block is equal to the size of the smallest cluster, the method further comprises: calculating a first energy cost of processing the non-zero value weights utilizing an unblocked element cluster comprising an unblocked element buffer and at least one digital arithmetic logic unit; calculating a second energy cost of processing the non-zero value weights with the smallest cluster; determining a lower energy cost among the first energy cost and the second energy cost; and assigning the non-zero value weights to the unblocked element cluster or the smallest cluster depending on the lower energy cost. 10. The method of claim 8 , wherein, if the size of the at least one second sub-block is larger than the size of the smallest cluster, the method further comprises: sub-partitioning the at least one second sub-block into a plurality of sub-regions having sizes matching sizes of a first plurality of clusters of the plurality of clusters; calculating a first total energy cost of processing the non-zero value weights of each of the plurality of sub-regions with the first plurality of clusters; calculating a second total energy cost of processing the non-zero value weights of the second sub-block with a single cluster having a same size as the second sub-block; determining a lower total energy cost among the first total energy cost and the second total energy cost; and assigning the non-zero value weights of the plurality of sub-regions to the first plurality of clusters or assigning the non-zero value weights of the at least one second sub-block to the single cluster depending on the lower total energy cost. 11. A system for performing inference with an artificial neural network having a sparse weight matrix, the system comprising: a network-on-chip comprising a plurality of clusters, each cluster of the plurality of clusters comprising an array of memristor crossbars; a processor; and a non-transitory computer-readable storage medium having instructions stored therein, which, when executed by the processor, cause the processor to: partition the sparse weight matrix, based on an arrangement of zero-value weights and non-zero value weights in the sparse weight matrix, into at least one first sub-block and at least one second sub-block, the at least one first sub-block comprising a plurality of weight values, the plurality of weight values of the at least one first sub-block containing only zero-value weights and the at least one second sub-block comprising non-zero value weights; and assign the non-zero value weights in the at least one second sub-block to at least one cluster of the plurality of clusters of the metwork-on-chip, wherein the network-on-chip is configured to perform matrix-vector-multiplication (MVM) between the non-zero value weights of the at least one second sub-block and an input vector. 12. The system of claim 11 , wherein the instructions, when executed by the processor, further cause the processor to identify clusters of the plurality of clusters that were not assigned at least one non-zero value weight. 13. The system of claim 12 , wherein the instructions, when executed by the processor, further cause the processor to completely cut off power to the clusters that were not assigned at least one non-zero value weight. 14. The system of claim 11 , wherein each memristor of the array of memristor crossbars is selected from the group consisting of resistive random access memory (RRAM), conductive-bridging random access memory (CBRAM), phase-change memory (PCM), a ferroelectric field effect transistor (FerroFET), and spin-transfer torque random access memory (STT RAM). 15. The system of claim 11 , wherein the network-on-chip further comprises a plurality of selectors connected in series to the array of memresistor crossbars, and wherein the instructions, when executed by the processor, further cause the processor to assign the non-zero value weights by setting a resistance of the memristor crossbars utilizing the selectors. 16. The system of claim 11 , wherein: the sparse weight matrix has a size of 512×512, the at least one first sub-block has a size of 256×256, and the at least one second sub-block has a size selected from the group consisting of 128×128, 64×64, and 32×32. 17. The system of claim 11 , wherein the instructions, when executed by the processor, further cause the processor to compare, recursively, a size of the at least one second sub-block to a size of a smallest cluster of the plurality of clusters. 18. The system of claim 17 , wherein, if the size of the at least one second sub-block is equal to the size of the smallest cluster, the instructions further cause the processor to: calculate a first energy cost of processing the non-zero value weights utilizing an unblocked element cluster comprising an unblocked element buffer and at least one digital arithmetic logic unit; calculate a second energy cost of processing the non-zero value weights with the smallest cluster; determine a lower energy cost among the first energy cost and the s
Quantised networks; Sparse networks; Compressed networks · CPC title
Analogue means · CPC title
Learning methods · CPC title
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
Architecture, e.g. interconnection topology · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.