Hardware apparatuses and methods to prefetch a multidimensional block of elements from a multimensional array
US-2016188337-A1 · Jun 30, 2016 · US
US12106100B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12106100-B2 |
| Application number | US-201716487421-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 1, 2017 |
| Priority date | Mar 20, 2017 |
| Publication date | Oct 1, 2024 |
| Grant date | Oct 1, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address; and execution circuitry to execute the decoded instruction to set a tile configuration for the processor to utilize tiles in matrix operations based on a description retrieved from the memory address, wherein a tile a set of 2-dimensional registers are discussed.
Opening claim text (preview).
We claim: 1. An apparatus comprising: matrix operations circuitry to execute one or more decoded matrix operation instructions on data stored in two-dimensional data structures; storage to store the two-dimensional data structures according to a to be loaded configuration, the to be loaded configuration to at least independently describe a number of rows and a number of columns per two-dimensional data structure, wherein the configuration is to be loaded in response to execution of a single matrix usage configuration instruction, wherein the single matrix usage configuration instruction is to not load data to be stored in a two-dimensional data structure; and execution circuitry to execute the single matrix usage configuration instruction and to support a plurality of instructions to perform a computational operation after the execution of the single matrix usage configuration instruction. 2. The apparatus of claim 1 , wherein the storage is a plurality of packed data registers and the two-dimensional data structures are overlaid on at least a subset of two of the plurality of packed data registers. 3. The apparatus of claim 1 , wherein the storage is a plurality of packed data registers and memory, and the two-dimensional data structures are overlaid on at least a subset of two of the plurality of packed data registers and memory. 4. The apparatus of claim 1 , wherein the matrix operations circuitry is a plurality of chained fused multiply accumulate circuits. 5. The apparatus of claim 4 , wherein each of the chained fused multiply accumulate circuits is to include storage for a portion of a two-dimensional data structure that the fused multiply accumulate circuit is to operate on. 6. The apparatus of claim 1 , wherein the matrix operations circuitry supports element matrix multiply, subtract, and add instructions. 7. The apparatus of claim 1 , wherein the matrix operations circuitry supports dot product and multiply accumulate operations. 8. The apparatus of claim 1 , wherein the matrix operations circuitry supports matrix transpose and diagonal operations. 9. A system comprising: a host processor including execution circuitry to support a single matrix usage configuration instruction to configure a matrix operations accelerator and a plurality of instructions to cause the matrix operations accelerator to perform a computational operation after the matrix operations accelerator has been configured by an execution of the single matrix usage configuration instruction; and the matrix operations accelerator coupled to the host processor, wherein the matrix operations accelerator is to perform matrix operations on two-dimensional data structures using a computational grid based on commands received from the host processor, wherein the two-dimensional data structures are to be configured according to a to be loaded configuration, the to be loaded configuration to at least describe a number of rows and a number of columns per two-dimensional data structure of the two-dimensional data structures, wherein the configuration is to be loaded in response to the single matrix usage configuration instruction and the single matrix usage configuration instruction is to not load data to be stored in a two-dimensional data structure. 10. The system of claim 9 , wherein the matrix operations accelerator further comprises a plurality of data buffers to buffer matrix data in two-dimensional data structures. 11. The system of claim 10 , wherein the computational grid is to house at least one of the buffered matrix data from the plurality of data buffers during a matrix manipulation operation. 12. The system of claim 10 , wherein the data buffers are a plurality of registers. 13. The system of claim 12 , wherein the plurality of registers are a plurality of packed data registers and the two-dimensional data structures are overlaid on at least two of the plurality of packed data registers. 14. The system of claim 12 , wherein the two-dimensional data structures are to be configured to use a plurality of packed data registers and memory. 15. The system of claim 9 , wherein the matrix operations accelerator comprises a plurality of chained fused multiply add circuits. 16. The system of claim 15 , wherein each of the chained fused multiply add circuits is to include storage for a portion of a two-dimensional data structure that the fused multiply add circuit is to operate on. 17. The system of claim 9 , further comprising a coherent memory interface coupled to the matrix operations accelerator and host processor to provide access to shared memory between the host processor and matrix operations accelerator. 18. The apparatus of claim 1 , wherein the to be loaded configuration is to be loaded in response to a tile configuration instruction. 19. The apparatus of claim 1 , wherein the to be loaded configuration is to include restart information.
Image or video data · CPC title
Vector or matrix data · CPC title
Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title
with multidimensional access, e.g. row/column, matrix · CPC title
Recovery, e.g. branch miss-prediction, exception handling (error detection or correction G06F11/00) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.