Multiple accumulate busses in a systolic array

US12182064B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12182064-B2
Application numberUS-202318446357-A
CountryUS
Kind codeB2
Filing dateAug 8, 2023
Priority dateJun 29, 2020
Publication dateDec 31, 2024
Grant dateDec 31, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.

First claim

Opening claim text (preview).

What is claimed is: 1. A systolic processor comprising: a systolic array of processing elements arranged in rows and columns; wherein the systolic array of processing elements is divided into a plurality of sub-arrays of processing elements, each sub-array of the plurality of sub-arrays including at least one of 1) a plurality of non-consecutive columns of the columns of the systolic array, or 2) a plurality of non-consecutive rows of the rows of the systolic array, an individual processing element within a sub-array of the plurality of sub-arrays configured to: perform one or more operations on an input, and provide an output to a subsequent processing element within the same sub-array, wherein the subsequent processing element is separated within the systolic array from the individual processing element by one or more intervening processing elements corresponding to a different sub-array of the plurality of sub-arrays. 2. The systolic processor of claim 1 , wherein a number of the one or more intervening processing elements is equal to one less than a number of buses of the systolic array. 3. The systolic processor of claim 1 , wherein a first processing element within a first sub-array of the plurality of sub-arrays provides an output to a second processing element within the first sub-array that is separated within the systolic array from the first processing element by n intervening processing elements, and wherein the second processing element provides a second output to a third processing element within the first sub-array that is separated within the systolic array from the second processing element by n intervening processing elements. 4. The systolic processor of claim 1 , wherein the individual processing element corresponds to at least one of a first row or a first column of the systolic array, wherein the subsequent processing element corresponds to at least one of a third row or a third column of the systolic array, wherein the first row and the third row are separated within the rows of the systolic array by at least a second row of the systolic array, and wherein the first column and the third column are separated within the columns of the systolic array by at least a second column of the systolic array. 5. The systolic processor of claim 1 , wherein the systolic array includes a plurality of busses. 6. The systolic processor of claim 1 , wherein a number of the plurality of sub-arrays is equal to a number of buses of the systolic array. 7. The systolic processor of claim 1 , wherein the subsequent processing element is separated within the systolic array from the individual processing element by the one or more intervening processing elements based on the distance that a bus of the systolic array can be driven in a clock cycle. 8. The systolic processor of claim 1 , wherein the one or more operations comprise: receiving an input data element and a weight; multiplying the input data element and the weight to generate a product; and adding an input partial sum to the product to generate an output partial sum. 9. The systolic processor of claim 8 , wherein the output comprises at least one of the input data element, the weight, or the output partial sum. 10. The systolic processor of claim 1 , wherein a first processing element within a first sub-array of the plurality of sub-arrays and a second processing element within a second sub-array of the plurality of sub-arrays are configured to receive a respective input in a first systolic interval. 11. The systolic processor of claim 1 , wherein a first processing element within a first sub-array of the plurality of sub-arrays is configured to receive a respective input in a first systolic interval, and wherein a second processing element within the first sub-array is configured to receive a respective input in a second systolic interval, wherein the first systolic interval and the second systolic interval are separated within systolic intervals by at least one intervening systolic interval. 12. A systolic circuit comprising: a systolic array of processing elements arranged in rows and columns, wherein the systolic array of processing elements is divided into a plurality of sub-arrays of processing elements, and wherein an initial processing element within a first sub-array of the plurality of sub-arrays is adjacent within the systolic array to an initial processing element within a second sub-array of the plurality of sub-arrays, the initial processing element within the first sub-array configured to: perform one or more operations on an input, and provide an output to a subsequent processing element within the first sub-array, wherein the subsequent processing element is separated within the systolic array from the initial processing element within the first sub-array by at least the initial processing element within the second sub-array. 13. The systolic circuit of claim 12 , wherein each sub-array of the plurality of sub-arrays including at least one of 1) a plurality of non-consecutive columns of the columns of the systolic array, or 2) a plurality of non-consecutive rows of the rows of the systolic array. 14. The systolic circuit of claim 12 , wherein the initial processing element within the first sub-array is separated within the systolic array from each other processing element within the first sub-array by at least one intervening processing element. 15. The systolic circuit of claim 12 , wherein at least one of: each row of the rows of the systolic array includes a plurality of row-oriented buses; or each column of the columns of the systolic array includes a plurality of columnar buses. 16. The systolic circuit of claim 12 , wherein each sub-array of the plurality of sub-arrays includes: a plurality of consecutive columns of the columns of the systolic array; and a plurality of non-consecutive rows of the rows of the systolic array. 17. The systolic circuit of claim 12 , wherein each sub-array of the plurality of sub-arrays includes: a plurality of consecutive rows of the rows of the systolic array; and a plurality of non-consecutive columns of the columns of the systolic array. 18. A method comprising: receiving first data corresponding to a first sub-array of a systolic array and second data corresponding to a second sub-array of the systolic array; passing the first data to a first processing element within the first sub-array; passing the second data to a second processing element within the second sub-array; at the first processing element: performing a first set of operations on the first data to generate an output; and providing the output to a third processing element within the first sub-array, wherein the third processing element is separated within the systolic array from the first processing element by at least the second processing element. 19. The method of claim 18 , wherein passing the first data to the first processing element comprises: passing the first data through one or more delay registers to the first processing element. 20. The method of claim 18 , wherein the first sub-array comprises a first plurality of non-consecutive columns of the systolic array and a first plurality of non-consecutive rows of the systolic array, wherein the second sub-array comprises a second plurality of non-consecutive columns of the systolic array and a second plurality of non-consecutive rows of the systolic array, wherein at least a first column of the second plurality of non-consecutive columns is interleaved withi

Assignees

Inventors

Classifications

  • in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination · CPC title

  • controlled in tandem, e.g. multiplier-accumulator · CPC title

  • G06F7/5443Primary

    Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title

  • Arithmetic instructions · CPC title

  • in parallel-parallel fashion, i.e. both operands being entered in parallel (G06F7/533 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12182064B2 cover?
Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arit…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F7/5443. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 31 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).