Systems and methods for reducing power consumption of convolution operations for artificial neural networks
US-11120328-B1 · Sep 14, 2021 · US
US11726950B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11726950-B2 |
| Application number | US-201916586975-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 28, 2019 |
| Priority date | Sep 28, 2019 |
| Publication date | Aug 15, 2023 |
| Grant date | Aug 15, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A compute near memory (CNM) convolution accelerator enables a convolutional neural network (CNN) to use dedicated acceleration to achieve efficient in-place convolution operations with less impact on memory and energy consumption. A 2D convolution operation is reformulated as 1D row-wise convolution. The 1D row-wise convolution enables the CNM convolution accelerator to process input activations row-by-row, while using the weights one-by-one. Lightweight access circuits provide the ability to stream both weights and input rows as vectors to MAC units, which in turn enables modules of the CNM convolution accelerator to implement convolution for both [1×1] and chosen [n×n] sized filters.
Opening claim text (preview).
What is claimed is: 1. An integrated circuit comprising: a memory to store one or more channels of a same filter row of a filter, each channel of the same filter row to be stored contiguously in the memory, row by row, in a channel-wise order; an input buffer to receive one or more channels of input row vectors of input activations streamed to the input buffer, row by row, in the channel-wise order; circuitry, including: a multiplexer circuit to select a selected weight from a stored filter row, and a multiplexer array to access the input activations from the input buffer based on a stride input and a weight position of the selected weight; and at least one array of multiply and accumulate (MAC) units coupled to the circuitry, the at least one array of MAC units to compute, from the selected weight and the input activations, a partial sum for a convolution; and wherein the circuitry enables access to the memory and the input buffer by the at least one array of MAC units to accelerate the convolution. 2. The integrated circuit of claim 1 , wherein: the stride input is a number applied in the circuitry to shift access to an input row vector of the input activations streamed to the input buffer by buffer positions of the input buffer equal to the number; and the weight position is relative to weight positions of neighboring weights of the stored filter row from which the selected weight was selected. 3. The integrated circuit of claim 2 , further comprising an output buffer to store the partial sum computed by the at least one array of MAC units. 4. The integrated circuit of claim 3 , wherein a width of the output buffer is coordinated with a width of the input buffer, the width of the output buffer equal to a number of the MAC units in the at least one array of MAC units. 5. The integrated circuit of claim 1 , wherein the circuitry and the at least one array of MAC units comprise a compute near memory (CNM) circuit block of a CNM accelerator, the integrated circuit further comprising a systolic array of CNM circuit blocks arranged to accumulate partial sums computed by respective arrays of MAC units in the systolic array of CNM circuit blocks into an output feature map representing the convolution. 6. The integrated circuit of claim 5 , wherein the one or more channels of input row vectors of input activations streamed to the input buffer are reused in each CNM circuit block in the systolic array of CNM circuit blocks, row by row, in the channel-wise order. 7. The integrated circuit of claim 5 , wherein one or more channels of same filter rows are distributed to the systolic array of CNM circuit blocks in row-wise order, the distributed filter rows to be stored contiguously in the memory of each CNM circuit block, row by row, in the channel-wise order. 8. The integrated circuit of claim 1 , wherein the memory includes any of a static random access memory (SRAM) and a register file (RF).
Convolutional networks [CNN, ConvNet] · CPC title
Systolic arrays · CPC title
Multidimensional correlation or convolution · CPC title
using electronic means · CPC title
Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.