Compute near memory convolution accelerator

US11726950B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11726950-B2
Application numberUS-201916586975-A
CountryUS
Kind codeB2
Filing dateSep 28, 2019
Priority dateSep 28, 2019
Publication dateAug 15, 2023
Grant dateAug 15, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A compute near memory (CNM) convolution accelerator enables a convolutional neural network (CNN) to use dedicated acceleration to achieve efficient in-place convolution operations with less impact on memory and energy consumption. A 2D convolution operation is reformulated as 1D row-wise convolution. The 1D row-wise convolution enables the CNM convolution accelerator to process input activations row-by-row, while using the weights one-by-one. Lightweight access circuits provide the ability to stream both weights and input rows as vectors to MAC units, which in turn enables modules of the CNM convolution accelerator to implement convolution for both [1×1] and chosen [n×n] sized filters.

First claim

Opening claim text (preview).

What is claimed is: 1. An integrated circuit comprising: a memory to store one or more channels of a same filter row of a filter, each channel of the same filter row to be stored contiguously in the memory, row by row, in a channel-wise order; an input buffer to receive one or more channels of input row vectors of input activations streamed to the input buffer, row by row, in the channel-wise order; circuitry, including: a multiplexer circuit to select a selected weight from a stored filter row, and a multiplexer array to access the input activations from the input buffer based on a stride input and a weight position of the selected weight; and at least one array of multiply and accumulate (MAC) units coupled to the circuitry, the at least one array of MAC units to compute, from the selected weight and the input activations, a partial sum for a convolution; and wherein the circuitry enables access to the memory and the input buffer by the at least one array of MAC units to accelerate the convolution. 2. The integrated circuit of claim 1 , wherein: the stride input is a number applied in the circuitry to shift access to an input row vector of the input activations streamed to the input buffer by buffer positions of the input buffer equal to the number; and the weight position is relative to weight positions of neighboring weights of the stored filter row from which the selected weight was selected. 3. The integrated circuit of claim 2 , further comprising an output buffer to store the partial sum computed by the at least one array of MAC units. 4. The integrated circuit of claim 3 , wherein a width of the output buffer is coordinated with a width of the input buffer, the width of the output buffer equal to a number of the MAC units in the at least one array of MAC units. 5. The integrated circuit of claim 1 , wherein the circuitry and the at least one array of MAC units comprise a compute near memory (CNM) circuit block of a CNM accelerator, the integrated circuit further comprising a systolic array of CNM circuit blocks arranged to accumulate partial sums computed by respective arrays of MAC units in the systolic array of CNM circuit blocks into an output feature map representing the convolution. 6. The integrated circuit of claim 5 , wherein the one or more channels of input row vectors of input activations streamed to the input buffer are reused in each CNM circuit block in the systolic array of CNM circuit blocks, row by row, in the channel-wise order. 7. The integrated circuit of claim 5 , wherein one or more channels of same filter rows are distributed to the systolic array of CNM circuit blocks in row-wise order, the distributed filter rows to be stored contiguously in the memory of each CNM circuit block, row by row, in the channel-wise order. 8. The integrated circuit of claim 1 , wherein the memory includes any of a static random access memory (SRAM) and a register file (RF).

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Systolic arrays · CPC title

  • Multidimensional correlation or convolution · CPC title

  • using electronic means · CPC title

  • G06F7/5443Primary

    Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11726950B2 cover?
A compute near memory (CNM) convolution accelerator enables a convolutional neural network (CNN) to use dedicated acceleration to achieve efficient in-place convolution operations with less impact on memory and energy consumption. A 2D convolution operation is reformulated as 1D row-wise convolution. The 1D row-wise convolution enables the CNM convolution accelerator to process input activation…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F15/8046. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 15 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).