Register file for systolic array

US12346694B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12346694-B2
Application numberUS-202117304794-A
CountryUS
Kind codeB2
Filing dateJun 25, 2021
Priority dateJun 25, 2021
Publication dateJul 1, 2025
Grant dateJul 1, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processing apparatus includes a general-purpose parallel processing engine including a set of multiple processing elements including a single precision floating-point unit, a double precision floating point unit, and an integer unit; a matrix accelerator including one or more systolic arrays; a first register file coupled with a first read control circuit, wherein the first read control circuit couples with the set of multiple processing elements and the matrix accelerator to arbitrate read requests to the first register file from the set of multiple processing elements and the matrix accelerator; and a second register file coupled with a second read control circuit, wherein the second read control circuit couples with the matrix accelerator to arbitrate read requests to the second register file from the matrix accelerator and limit access to the second register file by the set of multiple processing elements.

First claim

Opening claim text (preview).

What is claimed is: 1. A processing apparatus including: a general-purpose parallel processing engine comprising: a set of multiple processing elements including a single precision floating-point unit, a double precision floating point unit, and an integer unit; a matrix accelerator including one or more systolic arrays; a first register file coupled with a first read control circuit, wherein the first read control circuit couples with the set of multiple processing elements and the matrix accelerator to arbitrate read requests to the first register file from the set of multiple processing elements and the matrix accelerator; and a second register file coupled with a second read control circuit, wherein the second read control circuit couples with the matrix accelerator to arbitrate read requests to the second register file from the matrix accelerator and limit access to the second register file by the set of multiple processing elements. 2. The processing apparatus as in claim 1 , wherein the second read control circuit couples with a subset of the set of multiple processing elements and is additionally configured to arbitrate read requests to the second register file from the subset of the set of multiple processing elements, wherein the subset of the set of multiple processing elements includes less than all processing elements in the set of multiple processing elements. 3. The processing apparatus as in claim 2 , wherein the subset of the set of multiple processing elements includes the integer unit. 4. The processing apparatus as in claim 3 , wherein the set of multiple processing elements additionally include a math unit to perform a transcendental math operation and the subset of the set of multiple processing elements includes the math unit. 5. The processing apparatus as in claim 1 , wherein the matrix accelerator is to: receive a command to execute an instruction to perform a matrix operation on a set of input data; read a first set of input data from the first register file; read a second set of input data from the second register file; perform operations associated with the instruction according to the command; and write output of the operations to a register file selected from one of the first register file or the second register file. 6. The processing apparatus as in claim 5 , further comprising a write control circuit coupled with the matrix accelerator, the set of multiple processing elements, the first register file, and the second register file, wherein the write control circuit is to arbitrate write requests to the first register file and the second register file and the matrix accelerator is to write output of the operations via the write control circuit. 7. The processing apparatus as in claim 5 , wherein the operations associated with the instruction include one or more dot product operations. 8. The processing apparatus as in claim 7 , wherein to read the second set of input data includes to read a first set of matrix elements associated with a first matrix and a second set of matrix elements associated with a second matrix. 9. The processing apparatus as in claim 7 , wherein to read the first set of input data includes to read a value to add to a dot product computed by a pipeline stage of the one or more systolic arrays. 10. A method comprising: executing general-purpose parallel processing via a general-purpose parallel processing engine including a set of multiple processing elements including a single precision floating-point unit, a double precision floating-point unit, and an integer unit; performing a matrix operation using a matrix accelerator including a systolic array; arbitrating read requests from the set of multiple processing elements and the matrix accelerator to a first register file via a first read control circuit; and arbitrating read requests from the matrix accelerator to a second register file via a second read control circuit, wherein the second read control circuit limits access to the second register file by the set of multiple processing elements. 11. The method as in claim 10 , further comprising performing the matrix operation via one or more pipeline stages of the systolic array within the matrix accelerator. 12. The method as in claim 11 , wherein performing the matrix operation includes performing a dot product operation on a first set of input data read from the first register file and a second set of input data read from the second register file via the one or more pipeline stages of the systolic array. 13. The method as in claim 12 , wherein reading the second set of input data includes reading a first set of matrix elements associated with a first matrix and a second set of matrix elements associated with a second matrix. 14. The method as in claim 13 , wherein reading the first set of input data includes reading a value to add to a dot product computed by the dot product operation. 15. A system comprising: a memory device; and a graphics processor coupled to the memory device, the graphics processor comprising a general-purpose parallel processing engine including: a set of multiple processing elements including a single precision floating-point unit, a double precision floating point unit, and an integer unit; a matrix accelerator including one or more systolic arrays; a first register file coupled with a first read control circuit, wherein the first read control circuit couples with the set of multiple processing elements and the matrix accelerator to arbitrate read requests to the first register file from the set of multiple processing elements and the matrix accelerator; and a second register file coupled with a second read control circuit, wherein the second read control circuit couples with the matrix accelerator to arbitrate read requests to the second register file from the matrix accelerator and limit access to the second register file by the set of multiple processing elements. 16. The system as in claim 15 , wherein the second read control circuit couples with a subset of the set of multiple processing elements and is additionally configured to arbitrate read requests to the second register file from the subset of the set of multiple processing elements, wherein the subset of the set of multiple processing elements includes less than all processing elements in the set of multiple processing elements. 17. The system as in claim 16 , wherein the set of multiple processing elements additionally include a math unit to perform a transcendental math operation and the subset of the set of multiple processing elements includes the math unit and the integer unit. 18. The system as in claim 15 , wherein the matrix accelerator is to: receive a command to execute an instruction to perform a matrix operation on a set of input data; read a first set of input data from the first register file; read a second set of input data from the second register file; perform operations associated with the instruction according to the command; and write output of the operations to a register file selected from one of the first register file or the second register file. 19. The system as in claim 18 , further comprising a write control circuit coupled with the matrix accelerator, the set of multiple processing elements, the first register file, and the second register file, wherein the write control circuit is to arbitrate write requests to the first register file and the second register file and the matrix accelerator is to write output of the operations via the writ

Assignees

Inventors

Classifications

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • using a mask · CPC title

  • Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers {(G06F7/4806, G06F7/4824, G06F7/49, G06F7/491, G06F7/544 take precedence)} · CPC title

  • organised in groups of units sharing resources, e.g. clusters · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12346694B2 cover?
A processing apparatus includes a general-purpose parallel processing engine including a set of multiple processing elements including a single precision floating-point unit, a double precision floating point unit, and an integer unit; a matrix accelerator including one or more systolic arrays; a first register file coupled with a first read control circuit, wherein the first read control circu…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/30036. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 01 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).