Training of artificial neural networks
US-2020364577-A1 · Nov 19, 2020 · US
US11347477B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11347477-B2 |
| Application number | US-201916586648-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 27, 2019 |
| Priority date | Sep 27, 2019 |
| Publication date | May 31, 2022 |
| Grant date | May 31, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A memory circuit includes a number (X) of multiply-accumulate (MAC) circuits that are dynamically configurable. The MAC circuits can either compute an output based on computations of X elements of the input vector with the weight vector, or to compute the output based on computations of a single element of the input vector with the weight vector, with each element having a one bit or multibit length. A first memory can hold the input vector having a width of X elements and a second memory can store the weight vector. The MAC circuits include a MAC array on chip with the first memory.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: a first memory to provide an input vector having a width of X elements, where X is an integer; and a multiply-accumulate (MAC) array on chip with the first memory, the MAC array including a second memory to store a weight matrix; and X MAC circuits to dynamically switch between a first configuration and a second configuration in response to a control signal, wherein in the first configuration, the X MAC circuits are to perform a matrix-matrix computation with the X elements of the input vector and a single element of the weight matrix; and wherein in the second configuration, the X MAC circuits are to perform a matrix-vector computation with a single element of the input vector and X elements of the weight matrix. 2. The apparatus of claim 1 , wherein the first memory comprises a static random access memory (SRAM). 3. The apparatus of claim 1 , wherein the second memory comprises a register file. 4. The apparatus of claim 1 , wherein the second memory comprises a static random access memory (SRAM). 5. The apparatus of claim 1 , wherein the MAC array is on a common memory die with the first memory, wherein the first memory is a cache memory for a processor. 6. The apparatus of claim 1 , wherein the MAC array is on a common memory die with the first memory, wherein the first memory is a scratchpad memory for a processor. 7. The apparatus of claim 1 , wherein the MAC array is within a system on a chip with the first memory, wherein the first memory is a cache memory for a processor. 8. The apparatus of claim 1 , wherein the MAC array is within a system on a chip with the first memory, wherein the first memory is a scratchpad memory for a processor. 9. The apparatus of claim 1 , the MAC array further comprising: a multiplexer (mux) to provide alternate paths between the first memory and the MAC array; and a mux controller to control the mux to select between the alternate paths. 10. The apparatus of claim 9 , wherein the mux controller is to control the mux for one input vector element to all X MAC circuits for a one-dimensional (1D) matrix-vector (M×V) computation. 11. The apparatus of claim 9 , wherein the mux controller is to control the mux for X different input vector elements to the X MAC circuits, respectively, for a two-dimensional (2D) matrix-matrix (M×M) computation. 12. A system, comprising: a scratchpad memory of a processing unit to provide an input vector having a width of X elements, where X is an integer; and a hardware accelerator coupled to the scratchpad memory of the processing unit, including compute near memory (CNM) circuitry having a multiply-accumulate (MAC) array, the MAC array including a local memory to store a weight matrix; and X MAC circuits to dynamically switch between a first configuration and a second configuration in response to a control signal, wherein in the first configuration, the X MAC circuits are to perform a matrix-matrix computation with the X elements of the input vector and a single element of the weight matrix; and wherein in the second configuration, the X MAC circuits are to perform a matrix-vector computation with a single element of the input vector and X elements of the weight matrix. 13. The system of claim 12 , wherein the scratchpad memory comprises a static random access memory (SRAM). 14. The system of claim 12 , wherein the local memory comprises a register file. 15. The system of claim 12 , wherein the local memory comprises a static random access memory (SRAM). 16. The system of claim 12 , wherein the hardware accelerator is integrated on a common memory die with the scratchpad memory. 17. The system of claim 12 , wherein the hardware accelerator is integrated on a system on a chip with the scratchpad memory. 18. The system of claim 12 , the MAC array further comprising: a multiplexer (mux) to provide alternate paths between the scratchpad memory and the MAC array; and a mux controller to control the mux to select between the alternate paths. 19. The system of claim 18 , wherein the mux controller is to control the mux for one input vector element to all X MAC circuits of the MAC array for a one-dimensional (1D) matrix-vector (M×V) computation. 20. The system of claim 18 , wherein the mux controller is to control the mux for X different input vector elements to the X MAC circuits of the MAC array, respectively, for a two-dimensional (2D) matrix-matrix (M×M) computation. 21. The system of claim 12 , wherein: the processing unit comprises a multicore host processor device; the system further comprises a display communicatively coupled to a host processor; the system further comprises a network interface communicatively coupled to a host processor; or the system further comprises a battery to power the system.
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
Vector or matrix data · CPC title
Performance improvement · CPC title
with multilevel cache hierarchies · CPC title
Globally asynchronous, locally synchronous, e.g. network on chip · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.