Supporting vector multiply add with double accumulator access in a graphics environment
US-2024103810-A1 · Mar 28, 2024 · US
US2023205729A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2023205729-A1 |
| Application number | US-202218146048-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 23, 2022 |
| Priority date | Dec 28, 2021 |
| Publication date | Jun 29, 2023 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A systolic neural CPU (SNCPU) including a two-dimensional systolic array of reconfigurable processing elements (PE's) fuses a conventional CPU with a convolutional neural network (CNN) accelerator in four phases of operation: row-CPU, column-accelerator, column-CPU, and row-accelerator. The SNCPU cycles through the four phases to avoid costly data movement across cores, reduce overhead, and reduce latency. The PE's communicate bidirectionally with neighboring PE's and memory units at an outer edge of the array. A row of PE's is configurable into a first deep neural network (DNN) accumulator at a first time and configurable into a first CPU pipeline at a second time. A column of PE's is configurable into a second DNN accumulator at a third time and configurable into a second CPU pipeline at a fourth time.
Opening claim text (preview).
What is claimed is: 1 . An electronic circuit comprising: a two-dimensional systolic array of reconfigurable processing elements, the reconfigurable processing elements communicatively coupled with neighboring reconfigurable processing elements for bidirectional communications; a plurality of memory units communicatively coupled with corresponding reconfigurable processing units at an outer edge of the two-dimensional systolic array, the plurality of memory units configured for bidirectional communications with the corresponding reconfigurable processing units; a plurality of accumulator modules, each of the plurality of accumulator modules communicatively coupled with at least one reconfigurable processing element and memory unit of a row or column of the two-dimensional systolic array; and a plurality of instruction caches, each of the plurality of instruction caches communicatively coupled with a reconfigurable processing element at an edge of a row or a column of the two-dimensional systolic array; wherein a row of the two-dimensional systolic array of reconfigurable processing elements is configurable into a first deep neural network (DNN) accumulator at a first time and configurable into a first CPU pipeline at a second time; and wherein a column of the two-dimensional systolic array of reconfigurable processing elements is configurable into a second deep neural network (DNN) accumulator at a third time and configurable into a second CPU pipeline at a fourth time. 2 . The electronic circuit of claim 1 , wherein the first CPU pipeline and the second CPU pipeline each comprise: a first reconfigurable processing element configured into a program counter (PC) register; a second and a third reconfigurable processing element configured together into an instruction fetch (IF) stage; a fourth and a fifth reconfigurable processing element each configured into a different instruction decoder for an instruction decoding (ID) stage; a sixth reconfigurable processing element configured into an arithmetic logic unit (ALU) of an execution (EX) stage; a seventh reconfigurable processing element configured into a branch unit of the EX stage; an eighth reconfigurable processing element configured into Boolean logic for functions of the ALU of the EX stage; a ninth reconfigurable processing element configured into a memory register (MEM) stage; and a tenth reconfigurable processing element configured into a write-back (WB) stage. 3 . The electronic circuit of claim 2 , wherein the first CPU pipeline and the second CPU pipeline each further comprise a register file (RF) communicatively coupled to receive data from the reconfigurable processing elements of the ID stage and the WB stage, and to send data to the EX stage. 4 . The electronic circuit of claim 2 , wherein the first CPU pipeline and the second CPU pipeline are each configured as a RISC-V CPU pipeline. 5 . The electronic circuit of claim 1 , further comprising: a row level-two memory unit communicatively coupled with a first subset of the plurality of memory units at one end of the rows of the two-dimensional systolic array of reconfigurable processing elements, the row level-two memory unit configured for bidirectional communications with the first subset of the plurality of memory units; and a column level-two memory unit communicatively coupled with a second subset of the plurality of memory units at one end of the columns of the two-dimensional systolic array of reconfigurable processing elements, the column level-two memory unit configured for bidirectional communications with the second subset of the plurality of memory units. 6 . The electronic circuit of claim 1 , further comprising a plurality of register files, each of the plurality of register files communicatively coupled with at least one of the plurality of reconfigurable processing elements of a corresponding row or column of the two-dimensional systolic array, the plurality of register files configured for bidirectional communications with the corresponding reconfigurable processing units. 7 . The electronic circuit of claim 1 , wherein the plurality of accumulator modules are configured to provide, when their respective row or column of the two-dimensional systolic array of reconfigurable processing elements is configured as a DNN accumulator, single instruction, multiple data (SIMD) support for at least one function selected from group consisting of pooling, rectified linear unit (ReLU) functionality, and accumulation. 8 . The electronic circuit of claim 1 , wherein the two-dimensional systolic array of reconfigurable processing elements comprises ten reconfigurable processing elements in each dimension. 9 . The electronic circuit of claim 1 , wherein the two-dimensional systolic array of reconfigurable processing elements is reconfigurable into four modes: a row-CPU mode wherein each row of the two-dimensional systolic array includes a RISC-V pipeline core that processes data from a left column toward a rightmost column of the two-dimensional systolic array and stores results in the right subset of the plurality of memory elements on the right side of the rows of the two-dimensional systolic array; a column-accelerator mode wherein data flows from a right subset of the plurality of memory elements on a right side of the rows leftward toward a leftmost column of the two-dimensional systolic array in an activation process and data accumulates downward toward a bottom subset of the plurality of memory elements on a bottom side of the columns in an accumulation process; a column-CPU mode wherein each column of the two-dimensional systolic array includes a RISC-V pipeline core that processes data from a top row toward a bottom row of the two-dimensional systolic array and stores results in the bottom subset of the plurality of memory elements on the bottom side of the columns of the two-dimensional systolic array; and a row-accelerator mode wherein data flows from the bottom subset of the plurality of memory elements on a bottom side of the columns upward toward a topmost row of the two-dimensional systolic array in an activation process and data accumulates rightward toward the right subset of the plurality of memory elements on a right side of the rows in an accumulation process. 10 . The electronic circuit of claim 9 , further comprising a control circuit configured to cause the electronic circuit to cycle through the row-CPU mode, the column-accelerator mode, the column-CPU mode, and the row-accelerator mode continuously until all neural network layers of the electronic circuit have finished eliminating intermediate data transfer across the processing elements. 11 . A method of performing deep neural network processing and computing processing in a two-dimensional systolic array of reconfigurable processing elements, the reconfigurable processing elements communicatively coupled with neighboring reconfigurable processing elements for bidirectional communications, the method comprising: configuring the two-dimensional systolic array of reconfigurable processing elements into a row-CPU mode wherein the two-dimensional systolic array is configured to perform standard computations on input data; receiving input data by the two-dimensional systolic array configured into row-CPU mode; performing standard computations on the input data by the two-dimensional systolic array; saving results of the row-CPU computations in memory elements local to the two-dimensional systolic array; configuring the two-dimensional systolic array of reconfigurable processing elements into a column-accelerator mode wherein the two-dimensional systolic array is configured t
the resource being a machine, e.g. CPUs, Servers, Terminals · CPC title
Systolic arrays · CPC title
using electronic means · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.