Method of enabling sparse neural networks on memresistive accelerators

US11816563B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11816563-B2
Application numberUS-201916409487-A
CountryUS
Kind codeB2
Filing dateMay 10, 2019
Priority dateJan 17, 2019
Publication dateNov 14, 2023
Grant dateNov 14, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of storing a sparse weight matrix for a trained artificial neural network in a circuit including a series of clusters. The method includes partitioning the sparse weight matrix into at least one first sub-block and at least one second sub-block. The first sub-block includes only zero-value weights and the second sub-block includes non-zero value weights. The method also includes assigning the non-zero value weights in the at least one second sub-block to at least one cluster of the series of clusters of the circuit. The circuit is configured to perform matrix-vector-multiplication (MVM) between the non-zero value weights of the at least one second sub-block and an input vector during an inference process utilizing the artificial neural network. The sub-blocks containing all zero elements are power gated, thereby reducing overall energy consumption for inference.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of storing a sparse weight matrix for a trained artificial neural network in a circuit comprising a plurality of clusters, the method comprising: partitioning the sparse weight matrix, based on an arrangement of zero-value weights and non-zero value weights in the sparse weight matrix, into at least one first sub-block and at least one second sub-block, the at least one first sub-block comprising a plurality of weight values, the plurality of weight values of the at least one first sub-block containing only zero-value weights and the at least one second sub-block comprising non-zero value weights; and assigning the non-zero value weights in the at least one second sub-block to at least one cluster of the plurality of clusters of the circuit, wherein the circuit is configured to perform matrix-vector-multiplication (MVM) between the non-zero value weights of the at least one second sub-block and an input vector. 2. The method of claim 1 , further comprising identifying clusters of the plurality of clusters that were not assigned at least one non-zero value weight during the assigning the non-zero value weights. 3. The method of claim 2 , further comprising completely cutting off power to the clusters that were not assigned at least one non-zero value weight. 4. The method of claim 1 , wherein each cluster of the plurality of clusters comprises an array of memristors. 5. The method of claim 4 , wherein the memristors are selected from the group consisting of resistive random access memory (RRAM), conductive-bridging random access memory (CBRAM), phase-change memory (PCM), a ferroelectric field effect transistor (FerroFET), spin-transfer torque random access memory (STT RAM), and combinations thereof. 6. The method of claim 4 , wherein the assigning the non-zero value weights comprises setting, utilizing a plurality of selectors connected in series to the memristors, a resistance of each of the memristors. 7. The method of claim 1 , wherein: the sparse weight matrix has a size of 512×512, the at least one first sub-block has a size of 256×256, and the at least one second sub-block has a size selected from the group consisting of 128×128, 64×64, and 32×32. 8. The method of claim 1 , wherein the partitioning the sparse weight matrix comprises comparing, recursively, a size of the at least one second sub-block to a size of a smallest cluster of the plurality of clusters. 9. The method of claim 8 , wherein, if the size of the at least one second sub-block is equal to the size of the smallest cluster, the method further comprises: calculating a first energy cost of processing the non-zero value weights utilizing an unblocked element cluster comprising an unblocked element buffer and at least one digital arithmetic logic unit; calculating a second energy cost of processing the non-zero value weights with the smallest cluster; determining a lower energy cost among the first energy cost and the second energy cost; and assigning the non-zero value weights to the unblocked element cluster or the smallest cluster depending on the lower energy cost. 10. The method of claim 8 , wherein, if the size of the at least one second sub-block is larger than the size of the smallest cluster, the method further comprises: sub-partitioning the at least one second sub-block into a plurality of sub-regions having sizes matching sizes of a first plurality of clusters of the plurality of clusters; calculating a first total energy cost of processing the non-zero value weights of each of the plurality of sub-regions with the first plurality of clusters; calculating a second total energy cost of processing the non-zero value weights of the second sub-block with a single cluster having a same size as the second sub-block; determining a lower total energy cost among the first total energy cost and the second total energy cost; and assigning the non-zero value weights of the plurality of sub-regions to the first plurality of clusters or assigning the non-zero value weights of the at least one second sub-block to the single cluster depending on the lower total energy cost. 11. A system for performing inference with an artificial neural network having a sparse weight matrix, the system comprising: a network-on-chip comprising a plurality of clusters, each cluster of the plurality of clusters comprising an array of memristor crossbars; a processor; and a non-transitory computer-readable storage medium having instructions stored therein, which, when executed by the processor, cause the processor to: partition the sparse weight matrix, based on an arrangement of zero-value weights and non-zero value weights in the sparse weight matrix, into at least one first sub-block and at least one second sub-block, the at least one first sub-block comprising a plurality of weight values, the plurality of weight values of the at least one first sub-block containing only zero-value weights and the at least one second sub-block comprising non-zero value weights; and assign the non-zero value weights in the at least one second sub-block to at least one cluster of the plurality of clusters of the metwork-on-chip, wherein the network-on-chip is configured to perform matrix-vector-multiplication (MVM) between the non-zero value weights of the at least one second sub-block and an input vector. 12. The system of claim 11 , wherein the instructions, when executed by the processor, further cause the processor to identify clusters of the plurality of clusters that were not assigned at least one non-zero value weight. 13. The system of claim 12 , wherein the instructions, when executed by the processor, further cause the processor to completely cut off power to the clusters that were not assigned at least one non-zero value weight. 14. The system of claim 11 , wherein each memristor of the array of memristor crossbars is selected from the group consisting of resistive random access memory (RRAM), conductive-bridging random access memory (CBRAM), phase-change memory (PCM), a ferroelectric field effect transistor (FerroFET), and spin-transfer torque random access memory (STT RAM). 15. The system of claim 11 , wherein the network-on-chip further comprises a plurality of selectors connected in series to the array of memresistor crossbars, and wherein the instructions, when executed by the processor, further cause the processor to assign the non-zero value weights by setting a resistance of the memristor crossbars utilizing the selectors. 16. The system of claim 11 , wherein: the sparse weight matrix has a size of 512×512, the at least one first sub-block has a size of 256×256, and the at least one second sub-block has a size selected from the group consisting of 128×128, 64×64, and 32×32. 17. The system of claim 11 , wherein the instructions, when executed by the processor, further cause the processor to compare, recursively, a size of the at least one second sub-block to a size of a smallest cluster of the plurality of clusters. 18. The system of claim 17 , wherein, if the size of the at least one second sub-block is equal to the size of the smallest cluster, the instructions further cause the processor to: calculate a first energy cost of processing the non-zero value weights utilizing an unblocked element cluster comprising an unblocked element buffer and at least one digital arithmetic logic unit; calculate a second energy cost of processing the non-zero value weights with the smallest cluster; determine a lower energy cost among the first energy cost and the s

Assignees

Inventors

Classifications

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • G06N3/065Primary

    Analogue means · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • G06N3/04Primary

    Architecture, e.g. interconnection topology · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11816563B2 cover?
A method of storing a sparse weight matrix for a trained artificial neural network in a circuit including a series of clusters. The method includes partitioning the sparse weight matrix into at least one first sub-block and at least one second sub-block. The first sub-block includes only zero-value weights and the second sub-block includes non-zero value weights. The method also includes assign…
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/065. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 14 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).