Systolic array accelerator systems and methods

US11003619B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11003619-B2
Application numberUS-201916283795-A
CountryUS
Kind codeB2
Filing dateFeb 24, 2019
Priority dateFeb 24, 2019
Publication dateMay 11, 2021
Grant dateMay 11, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure is directed to systems and methods for decomposing systolic array circuitry to provide a plurality of N×N systolic sub-array circuits, apportioning a first tensor or array into a plurality of N×M first input arrays, and apportioning a second tensor or array into a plurality of M×N second input arrays. Systolic array control circuitry transfers corresponding ones of the first input arrays and second input arrays to a respective one of the plurality of N×N systolic sub-array circuits. As the elements included in the first input array and the elements included in the second input array are transferred to the systolic sub-array, the systolic sub-array performs one or more mathematical operations using the first and the second input arrays. The systems and methods beneficially improve the usage of the systolic array circuitry thereby advantageously reducing the number of clock cycles needed to perform a given number of calculations.

First claim

Opening claim text (preview).

What is claimed: 1. A system, comprising: systolic array circuitry; and systolic array control circuitry to: decompose the systolic array circuitry into a plurality of N×N systolic sub-arrays; and apportion a first input tensor into a first plurality of N×M input arrays and a second input tensor into a second plurality of M×N input arrays; and wherein each respective one of at least a portion of the plurality of N×N systolic sub-arrays is to perform at least one mathematical operation to provide a respective one of a plurality of N×N results using corresponding ones of the N×M input arrays included in the first plurality of N×M input arrays and the M×N input arrays included in the second plurality of M×N input arrays. 2. The system of claim 1 , the systolic array control circuitry to further: combine the plurality of N×N results to provide one or more N×N output tensors. 3. The system of claim 2 , the systolic array control circuitry to further: cause a transfer of the one or more N×N output tensors to memory circuitry. 4. The system of claim 1 : wherein the at least one mathematical operation includes a multiplication operation; and the systolic array control circuitry to further: sum corresponding elements in each of the plurality of N×N results to provide one or more N×N output tensors. 5. The system of claim 1 : wherein the plurality of N×N systolic sub-arrays comprise a plurality of 2×2 systolic sub arrays; wherein the first plurality of N×M input arrays includes a plurality of 2×1 input arrays; and wherein the second plurality of M×N input arrays includes a plurality of 1×2 input arrays. 6. The system of claim 1 , the systolic array control circuitry to further: cause a transfer of the first input tensor from memory circuitry; and cause a transfer of the second input tensor from the memory circuitry. 7. A non-transitory storage device that includes instructions that, when executed by systolic array control circuitry, cause the systolic array control circuitry to: decompose a systolic array circuitry into a plurality of N×N systolic sub-arrays; apportion a first input tensor into a first plurality of N×M input arrays and a second input tensor into a second plurality of M×N input arrays; and wherein each respective one of at least a portion of the plurality of N×N systolic sub-arrays is to perform at least one mathematical operation to provide a respective one of a plurality of N×N results using corresponding ones of the N×M input arrays included in the first plurality of N×M input arrays and the M×N input arrays included in the second plurality of M×N input arrays. 8. The non-transitory storage device of claim 7 wherein the instructions further cause the systolic array control circuitry to: combine the plurality of N×N results to provide one or more N×N output tensors. 9. The non-transitory storage device of claim 8 wherein the instructions further cause the systolic array control circuitry to: cause a transfer of the one or more N×N output tensors to memory circuitry. 10. The non-transitory storage device of claim 7 wherein the instructions further cause the systolic array control circuitry to: transfer corresponding elements of the first plurality of N×M input arrays and the second plurality of M×N input arrays to the plurality of N×N systolic sub-arrays to cause performance of a multiplication operation to provide a respective one of the plurality of N×N results. 11. The non-transitory storage device of claim 10 wherein the instructions further cause the systolic array control circuitry to: sum corresponding elements in each of the plurality of N×N results to provide the one or more N×N output tensors. 12. The non-transitory storage device of claim 7 : wherein the instructions that cause the systolic array control circuitry to decompose the systolic array circuitry into a plurality of N×N systolic sub-arrays further cause the systolic array control circuitry to: decompose the systolic array circuitry into a plurality of 2×2 systolic sub-arrays; wherein the instructions that cause the systolic array control circuitry to apportion the first input tensor into the first plurality of N×M input arrays further cause the systolic array control circuitry to: apportion the first input tensor into a first plurality of 2×1 input arrays; and wherein the instructions that cause the systolic array control circuitry to apportion the second input tensor into the second plurality of M×N input arrays further cause the systolic array control circuitry to: apportion the second input tensor into a second plurality of 1×2 input arrays. 13. The non-transitory storage device of claim 10 wherein the instructions further cause the systolic array control circuitry to: cause a transfer of the first input tensor from memory circuitry; and cause a transfer of the second input tensor from the memory circuitry. 14. A method, comprising: decomposing, by systolic array control circuitry, a systolic array circuitry into a plurality of N×N systolic sub-arrays; apportioning, by the systolic array control circuitry, a first input tensor into a first plurality of N×M input arrays and a second input tensor into a second plurality of M×N input arrays; and for each respective one of at least a portion of the plurality of N×N systolic sub-arrays, performing at least one mathematical operation to provide a respective one of a plurality of N×N results using corresponding ones of the N×M input arrays included in the first plurality of N×M input arrays and the M×N input arrays included in the second plurality of M×N input arrays. 15. The method of claim 14 , further comprising: combining, by the systolic array control circuitry, the plurality of N×N results to provide one or more N×N output tensors. 16. The method of claim 15 , further comprising: transferring, by the systolic array control circuitry, the one or more N×N output tensors to memory circuitry. 17. The method of claim 14 wherein performing the at least one mathematical operation to provide the plurality of N×N results comprises: causing the systolic array circuitry to perform a multiplication operation to provide the plurality of N×N results. 18. The method of claim 17 , further comprising: summing corresponding elements in each of the plurality of N×N results to provide one or more N×N output tensors. 19. The method of claim 14 : wherein decomposing the systolic array circuitry into the plurality of N×N systolic sub-arrays comprises: decomposing, by the systolic array control circuitry, the systolic array circuitry into a plurality of 2×2 systolic sub-arrays; wherein apportioning the first input tensor into the first plurality of N×M input arrays comprises: apportioning, by the systolic array control circuitry, the first input tensor into a first plurality of 2×1 input arrays; and wherein apportioning the second input tensor into the second plurality of M×N input arrays comprises apportioning, by the systolic array control circuitry, the second input tensor into a second plurality of 1×2 input arrays. 20. The method of claim 14 , further comprising: transferring the first input tensor from memory circuitry coupled to the systolic array circuitry; and transferring the second input tensor from the memory circuitry. 21. A system, comprising: means for decomposing a systolic array circuitry into a plurality of N×N systolic sub-arrays; means for apportioning a first input tensor int

Assignees

Inventors

Classifications

  • single instruction multiple data [SIMD] multiprocessors · CPC title

  • G06F17/16Primary

    Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • Two dimensional arrays, e.g. mesh, torus · CPC title

  • Systolic arrays · CPC title

  • using electronic means · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11003619B2 cover?
The present disclosure is directed to systems and methods for decomposing systolic array circuitry to provide a plurality of N×N systolic sub-array circuits, apportioning a first tensor or array into a plurality of N×M first input arrays, and apportioning a second tensor or array into a plurality of M×N second input arrays. Systolic array control circuitry transfers corresponding ones of the fi…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F15/8007. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 11 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).