Systolic neural cpu processor

US2023205729A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2023205729-A1
Application numberUS-202218146048-A
CountryUS
Kind codeA1
Filing dateDec 23, 2022
Priority dateDec 28, 2021
Publication dateJun 29, 2023
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A systolic neural CPU (SNCPU) including a two-dimensional systolic array of reconfigurable processing elements (PE's) fuses a conventional CPU with a convolutional neural network (CNN) accelerator in four phases of operation: row-CPU, column-accelerator, column-CPU, and row-accelerator. The SNCPU cycles through the four phases to avoid costly data movement across cores, reduce overhead, and reduce latency. The PE's communicate bidirectionally with neighboring PE's and memory units at an outer edge of the array. A row of PE's is configurable into a first deep neural network (DNN) accumulator at a first time and configurable into a first CPU pipeline at a second time. A column of PE's is configurable into a second DNN accumulator at a third time and configurable into a second CPU pipeline at a fourth time.

First claim

Opening claim text (preview).

What is claimed is: 1 . An electronic circuit comprising: a two-dimensional systolic array of reconfigurable processing elements, the reconfigurable processing elements communicatively coupled with neighboring reconfigurable processing elements for bidirectional communications; a plurality of memory units communicatively coupled with corresponding reconfigurable processing units at an outer edge of the two-dimensional systolic array, the plurality of memory units configured for bidirectional communications with the corresponding reconfigurable processing units; a plurality of accumulator modules, each of the plurality of accumulator modules communicatively coupled with at least one reconfigurable processing element and memory unit of a row or column of the two-dimensional systolic array; and a plurality of instruction caches, each of the plurality of instruction caches communicatively coupled with a reconfigurable processing element at an edge of a row or a column of the two-dimensional systolic array; wherein a row of the two-dimensional systolic array of reconfigurable processing elements is configurable into a first deep neural network (DNN) accumulator at a first time and configurable into a first CPU pipeline at a second time; and wherein a column of the two-dimensional systolic array of reconfigurable processing elements is configurable into a second deep neural network (DNN) accumulator at a third time and configurable into a second CPU pipeline at a fourth time. 2 . The electronic circuit of claim 1 , wherein the first CPU pipeline and the second CPU pipeline each comprise: a first reconfigurable processing element configured into a program counter (PC) register; a second and a third reconfigurable processing element configured together into an instruction fetch (IF) stage; a fourth and a fifth reconfigurable processing element each configured into a different instruction decoder for an instruction decoding (ID) stage; a sixth reconfigurable processing element configured into an arithmetic logic unit (ALU) of an execution (EX) stage; a seventh reconfigurable processing element configured into a branch unit of the EX stage; an eighth reconfigurable processing element configured into Boolean logic for functions of the ALU of the EX stage; a ninth reconfigurable processing element configured into a memory register (MEM) stage; and a tenth reconfigurable processing element configured into a write-back (WB) stage. 3 . The electronic circuit of claim 2 , wherein the first CPU pipeline and the second CPU pipeline each further comprise a register file (RF) communicatively coupled to receive data from the reconfigurable processing elements of the ID stage and the WB stage, and to send data to the EX stage. 4 . The electronic circuit of claim 2 , wherein the first CPU pipeline and the second CPU pipeline are each configured as a RISC-V CPU pipeline. 5 . The electronic circuit of claim 1 , further comprising: a row level-two memory unit communicatively coupled with a first subset of the plurality of memory units at one end of the rows of the two-dimensional systolic array of reconfigurable processing elements, the row level-two memory unit configured for bidirectional communications with the first subset of the plurality of memory units; and a column level-two memory unit communicatively coupled with a second subset of the plurality of memory units at one end of the columns of the two-dimensional systolic array of reconfigurable processing elements, the column level-two memory unit configured for bidirectional communications with the second subset of the plurality of memory units. 6 . The electronic circuit of claim 1 , further comprising a plurality of register files, each of the plurality of register files communicatively coupled with at least one of the plurality of reconfigurable processing elements of a corresponding row or column of the two-dimensional systolic array, the plurality of register files configured for bidirectional communications with the corresponding reconfigurable processing units. 7 . The electronic circuit of claim 1 , wherein the plurality of accumulator modules are configured to provide, when their respective row or column of the two-dimensional systolic array of reconfigurable processing elements is configured as a DNN accumulator, single instruction, multiple data (SIMD) support for at least one function selected from group consisting of pooling, rectified linear unit (ReLU) functionality, and accumulation. 8 . The electronic circuit of claim 1 , wherein the two-dimensional systolic array of reconfigurable processing elements comprises ten reconfigurable processing elements in each dimension. 9 . The electronic circuit of claim 1 , wherein the two-dimensional systolic array of reconfigurable processing elements is reconfigurable into four modes: a row-CPU mode wherein each row of the two-dimensional systolic array includes a RISC-V pipeline core that processes data from a left column toward a rightmost column of the two-dimensional systolic array and stores results in the right subset of the plurality of memory elements on the right side of the rows of the two-dimensional systolic array; a column-accelerator mode wherein data flows from a right subset of the plurality of memory elements on a right side of the rows leftward toward a leftmost column of the two-dimensional systolic array in an activation process and data accumulates downward toward a bottom subset of the plurality of memory elements on a bottom side of the columns in an accumulation process; a column-CPU mode wherein each column of the two-dimensional systolic array includes a RISC-V pipeline core that processes data from a top row toward a bottom row of the two-dimensional systolic array and stores results in the bottom subset of the plurality of memory elements on the bottom side of the columns of the two-dimensional systolic array; and a row-accelerator mode wherein data flows from the bottom subset of the plurality of memory elements on a bottom side of the columns upward toward a topmost row of the two-dimensional systolic array in an activation process and data accumulates rightward toward the right subset of the plurality of memory elements on a right side of the rows in an accumulation process. 10 . The electronic circuit of claim 9 , further comprising a control circuit configured to cause the electronic circuit to cycle through the row-CPU mode, the column-accelerator mode, the column-CPU mode, and the row-accelerator mode continuously until all neural network layers of the electronic circuit have finished eliminating intermediate data transfer across the processing elements. 11 . A method of performing deep neural network processing and computing processing in a two-dimensional systolic array of reconfigurable processing elements, the reconfigurable processing elements communicatively coupled with neighboring reconfigurable processing elements for bidirectional communications, the method comprising: configuring the two-dimensional systolic array of reconfigurable processing elements into a row-CPU mode wherein the two-dimensional systolic array is configured to perform standard computations on input data; receiving input data by the two-dimensional systolic array configured into row-CPU mode; performing standard computations on the input data by the two-dimensional systolic array; saving results of the row-CPU computations in memory elements local to the two-dimensional systolic array; configuring the two-dimensional systolic array of reconfigurable processing elements into a column-accelerator mode wherein the two-dimensional systolic array is configured t

Assignees

Inventors

Classifications

  • the resource being a machine, e.g. CPUs, Servers, Terminals · CPC title

  • Systolic arrays · CPC title

  • using electronic means · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2023205729A1 cover?
A systolic neural CPU (SNCPU) including a two-dimensional systolic array of reconfigurable processing elements (PE's) fuses a conventional CPU with a convolutional neural network (CNN) accelerator in four phases of operation: row-CPU, column-accelerator, column-CPU, and row-accelerator. The SNCPU cycles through the four phases to avoid costly data movement across cores, reduce overhead, and red…
Who is the assignee on this patent?
Univ Northwestern
What technology area does this patent fall under?
Primary CPC classification G06F15/8046. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 29 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).