Systems, methods, and apparatuses for tile matrix multiplication and accumulation
US-11086623-B2 · Aug 10, 2021 · US
US11243880B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-11243880-B1 |
| Application number | US-201816132243-A |
| Country | US |
| Kind code | B1 |
| Filing date | Sep 14, 2018 |
| Priority date | Sep 15, 2017 |
| Publication date | Feb 8, 2022 |
| Grant date | Feb 8, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processor having a functional slice architecture is divided into a plurality of functional units (“tiles”) organized into a plurality of slices. Each slice is configured to perform specific functions within the processor, which may include memory slices (MEM) for storing operand data, and arithmetic logic slices for performing operations on received operand data. The tiles of the processor are configured to stream operand data across a first dimension, and receive instructions across a second dimension orthogonal to the first dimension. The timing of data and instruction flows are configured such that corresponding data and instructions are received at each tile with a predetermined temporal relationship, allowing operand data to be transmitted between the slices of the processor without any accompanying metadata. Instead, each slice is able to determine what operations to perform on received data based upon the timing at which the data is received.
Opening claim text (preview).
What is claimed is: 1. A memory system of a processor comprising: a first stream register configured to transport data in a first direction to a first plurality of stream registers; a second stream register configured to transport data in a second direction opposite the first direction to a second plurality of stream registers; a plurality of memory tiles comprising at least a first and second memory tile, each memory tile comprising a respective memory for storing data and routing circuitry; and wherein the routing circuitry of the plurality of memory tiles couples the first and second memory tiles to the first and second stream registers, and wherein the first memory tile is selectively configured to write data from the first stream register to its respective memory or to read data out to another memory tile of the plurality of memory tiles or to a subsequent stream register of the first plurality of stream registers, and the second memory tile is selectively configured to write data from the second stream register to its respective memory or to read data out to another memory tile of the plurality of memory tiles or a subsequent stream register of the second plurality of stream registers, in accordance with instructions received at the plurality of memory tiles, the instructions received separately from data and having a predetermined timing. 2. The memory system of claim 1 , wherein the first plurality of stream registers form a first lane connecting the plurality of memory tiles, and the second plurality of stream registers form a second lane connecting the plurality of memory tiles. 3. The memory system of claim 1 , wherein each of the plurality of memory tiles is located on a respective memory slice, each memory slice comprising multiple memory tiles arranged linearly along a second dimension that is orthogonal to a first dimension corresponding to the first and second directions. 4. The memory system of claim 3 , further comprising an instruction control circuit located at a first end of each of the respective memory slices, the instruction control circuit configured to propagate instructions from an instruction buffer to memory tiles of the memory slice along the second dimension, such that each memory tile receives data and instructions via separate dimensions. 5. The memory system of claim 4 , wherein the first memory tile processes an instruction by: receiving, at the first memory tile, an instruction from an instruction control circuit of a memory slice that the memory tile is located on, the instruction propagated via another memory tile of the memory slice; receiving, at the first memory tile, data read via the first stream register, the data received at the memory tile with a predetermined temporal offset relative to receipt of the instruction as scheduled by a compiler; and processing the received instruction using the received data or data retrieved from a memory address within the memory tile specified by the received instruction. 6. The memory system of claim 5 , wherein the first memory tile is configured to select between data received via the first stream register and data stored within the memory of the first memory tile to be read out to a subsequent memory tile of the plurality of memory tiles, based upon the received instruction. 7. The memory system of claim 1 , wherein the first memory tile is able to write data from the first stream register during a same cycle that the second memory tile write data from the second stream register. 8. The memory system of claim 1 , wherein the plurality of memory tiles comprise four memory tiles positioned between the first stream register and the second stream register, wherein the first memory tile is adjacent to the first stream register, and the second memory tile is adjacent to the second stream register. 9. The memory system of claim 1 , wherein the first stream register is located in a first stream register file, and the second stream register is located in a second stream register file, wherein each stream register file comprises a plurality of stream registers each configured to transport data to subsequent stream registers of other stream register files in the first direction or the second direction. 10. The memory system of claim 9 , wherein the subsequent stream register of the first plurality of stream registers is located in the second stream register file, and the subsequent stream register of the second plurality of stream registers is located in the first stream register file. 11. The memory system of claim 1 , wherein the first memory tile is further selectively configurable to pass data to the subsequent stream register of the second plurality of stream registers, and the second memory tile is further selectively configurable to pass data to the subsequent stream register of the first plurality of stream registers. 12. The memory system of claim 1 , wherein the first memory tiles is configurable to write data from the second stream register to its respective memory passed via the routing circuitry of the second memory tile, and the second memory tiles is configurable to write data from the first stream register to its respective memory passed via the routing circuitry of the first memory tile. 13. The memory system of claim 1 , wherein the first and second plurality of stream registers are architecturally visible to a compiler associated with the processor. 14. A memory system of a processor comprising: a first stream register configured to transport data in a first direction to a first plurality of stream registers; a second stream register configured to transport data in a second direction opposite the first direction to a second plurality of stream registers; a first memory tile and a second memory tile, each comprising a respective memory for storing data and routing circuitry coupling the first and second memory tiles to the first and second stream registers respectively; and wherein the first memory tile is selectively configured to write data from the first stream register to its respective memory or to read data out to the second memory tile or to a subsequent stream register of the first plurality of stream registers, and the second memory tile is selectively configured to write data from the second stream register to its respective memory or to read data out to the first memory tile or a subsequent stream register of the second plurality of stream registers, in accordance with instructions received at the first and second memory tiles, the instructions received separately from data and having a predetermined timing. 15. The memory system of claim 14 , further comprising at least one additional memory tile positioned between the first memory tile and the second memory tile. 16. The memory system of claim 14 , wherein the first memory tile is adjacent to the first stream register, and the second memory tile is adjacent to the second stream register.
Two dimensional, e.g. mesh, torus · CPC title
from multiple instruction streams, e.g. multistreaming · CPC title
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
Machine learning · CPC title
with multidimensional access, e.g. row/column, matrix · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.