Compute in/near memory (CIM) circuit architecture for unified matrix-matrix and matrix-vector computations

US11347477B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11347477-B2
Application numberUS-201916586648-A
CountryUS
Kind codeB2
Filing dateSep 27, 2019
Priority dateSep 27, 2019
Publication dateMay 31, 2022
Grant dateMay 31, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A memory circuit includes a number (X) of multiply-accumulate (MAC) circuits that are dynamically configurable. The MAC circuits can either compute an output based on computations of X elements of the input vector with the weight vector, or to compute the output based on computations of a single element of the input vector with the weight vector, with each element having a one bit or multibit length. A first memory can hold the input vector having a width of X elements and a second memory can store the weight vector. The MAC circuits include a MAC array on chip with the first memory.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a first memory to provide an input vector having a width of X elements, where X is an integer; and a multiply-accumulate (MAC) array on chip with the first memory, the MAC array including a second memory to store a weight matrix; and X MAC circuits to dynamically switch between a first configuration and a second configuration in response to a control signal, wherein in the first configuration, the X MAC circuits are to perform a matrix-matrix computation with the X elements of the input vector and a single element of the weight matrix; and wherein in the second configuration, the X MAC circuits are to perform a matrix-vector computation with a single element of the input vector and X elements of the weight matrix. 2. The apparatus of claim 1 , wherein the first memory comprises a static random access memory (SRAM). 3. The apparatus of claim 1 , wherein the second memory comprises a register file. 4. The apparatus of claim 1 , wherein the second memory comprises a static random access memory (SRAM). 5. The apparatus of claim 1 , wherein the MAC array is on a common memory die with the first memory, wherein the first memory is a cache memory for a processor. 6. The apparatus of claim 1 , wherein the MAC array is on a common memory die with the first memory, wherein the first memory is a scratchpad memory for a processor. 7. The apparatus of claim 1 , wherein the MAC array is within a system on a chip with the first memory, wherein the first memory is a cache memory for a processor. 8. The apparatus of claim 1 , wherein the MAC array is within a system on a chip with the first memory, wherein the first memory is a scratchpad memory for a processor. 9. The apparatus of claim 1 , the MAC array further comprising: a multiplexer (mux) to provide alternate paths between the first memory and the MAC array; and a mux controller to control the mux to select between the alternate paths. 10. The apparatus of claim 9 , wherein the mux controller is to control the mux for one input vector element to all X MAC circuits for a one-dimensional (1D) matrix-vector (M×V) computation. 11. The apparatus of claim 9 , wherein the mux controller is to control the mux for X different input vector elements to the X MAC circuits, respectively, for a two-dimensional (2D) matrix-matrix (M×M) computation. 12. A system, comprising: a scratchpad memory of a processing unit to provide an input vector having a width of X elements, where X is an integer; and a hardware accelerator coupled to the scratchpad memory of the processing unit, including compute near memory (CNM) circuitry having a multiply-accumulate (MAC) array, the MAC array including a local memory to store a weight matrix; and X MAC circuits to dynamically switch between a first configuration and a second configuration in response to a control signal, wherein in the first configuration, the X MAC circuits are to perform a matrix-matrix computation with the X elements of the input vector and a single element of the weight matrix; and wherein in the second configuration, the X MAC circuits are to perform a matrix-vector computation with a single element of the input vector and X elements of the weight matrix. 13. The system of claim 12 , wherein the scratchpad memory comprises a static random access memory (SRAM). 14. The system of claim 12 , wherein the local memory comprises a register file. 15. The system of claim 12 , wherein the local memory comprises a static random access memory (SRAM). 16. The system of claim 12 , wherein the hardware accelerator is integrated on a common memory die with the scratchpad memory. 17. The system of claim 12 , wherein the hardware accelerator is integrated on a system on a chip with the scratchpad memory. 18. The system of claim 12 , the MAC array further comprising: a multiplexer (mux) to provide alternate paths between the scratchpad memory and the MAC array; and a mux controller to control the mux to select between the alternate paths. 19. The system of claim 18 , wherein the mux controller is to control the mux for one input vector element to all X MAC circuits of the MAC array for a one-dimensional (1D) matrix-vector (M×V) computation. 20. The system of claim 18 , wherein the mux controller is to control the mux for X different input vector elements to the X MAC circuits of the MAC array, respectively, for a two-dimensional (2D) matrix-matrix (M×M) computation. 21. The system of claim 12 , wherein: the processing unit comprises a multicore host processor device; the system further comprises a display communicatively coupled to a host processor; the system further comprises a network interface communicatively coupled to a host processor; or the system further comprises a battery to power the system.

Assignees

Inventors

Classifications

  • Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • Vector or matrix data · CPC title

  • Performance improvement · CPC title

  • with multilevel cache hierarchies · CPC title

  • Globally asynchronous, locally synchronous, e.g. network on chip · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11347477B2 cover?
A memory circuit includes a number (X) of multiply-accumulate (MAC) circuits that are dynamically configurable. The MAC circuits can either compute an output based on computations of X elements of the input vector with the weight vector, or to compute the output based on computations of a single element of the input vector with the weight vector, with each element having a one bit or multibit l…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F7/5443. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 31 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).