Systems, methods, and apparatuses for matrix operations

US12106100B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12106100-B2
Application numberUS-201716487421-A
CountryUS
Kind codeB2
Filing dateJul 1, 2017
Priority dateMar 20, 2017
Publication dateOct 1, 2024
Grant dateOct 1, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address; and execution circuitry to execute the decoded instruction to set a tile configuration for the processor to utilize tiles in matrix operations based on a description retrieved from the memory address, wherein a tile a set of 2-dimensional registers are discussed.

First claim

Opening claim text (preview).

We claim: 1. An apparatus comprising: matrix operations circuitry to execute one or more decoded matrix operation instructions on data stored in two-dimensional data structures; storage to store the two-dimensional data structures according to a to be loaded configuration, the to be loaded configuration to at least independently describe a number of rows and a number of columns per two-dimensional data structure, wherein the configuration is to be loaded in response to execution of a single matrix usage configuration instruction, wherein the single matrix usage configuration instruction is to not load data to be stored in a two-dimensional data structure; and execution circuitry to execute the single matrix usage configuration instruction and to support a plurality of instructions to perform a computational operation after the execution of the single matrix usage configuration instruction. 2. The apparatus of claim 1 , wherein the storage is a plurality of packed data registers and the two-dimensional data structures are overlaid on at least a subset of two of the plurality of packed data registers. 3. The apparatus of claim 1 , wherein the storage is a plurality of packed data registers and memory, and the two-dimensional data structures are overlaid on at least a subset of two of the plurality of packed data registers and memory. 4. The apparatus of claim 1 , wherein the matrix operations circuitry is a plurality of chained fused multiply accumulate circuits. 5. The apparatus of claim 4 , wherein each of the chained fused multiply accumulate circuits is to include storage for a portion of a two-dimensional data structure that the fused multiply accumulate circuit is to operate on. 6. The apparatus of claim 1 , wherein the matrix operations circuitry supports element matrix multiply, subtract, and add instructions. 7. The apparatus of claim 1 , wherein the matrix operations circuitry supports dot product and multiply accumulate operations. 8. The apparatus of claim 1 , wherein the matrix operations circuitry supports matrix transpose and diagonal operations. 9. A system comprising: a host processor including execution circuitry to support a single matrix usage configuration instruction to configure a matrix operations accelerator and a plurality of instructions to cause the matrix operations accelerator to perform a computational operation after the matrix operations accelerator has been configured by an execution of the single matrix usage configuration instruction; and the matrix operations accelerator coupled to the host processor, wherein the matrix operations accelerator is to perform matrix operations on two-dimensional data structures using a computational grid based on commands received from the host processor, wherein the two-dimensional data structures are to be configured according to a to be loaded configuration, the to be loaded configuration to at least describe a number of rows and a number of columns per two-dimensional data structure of the two-dimensional data structures, wherein the configuration is to be loaded in response to the single matrix usage configuration instruction and the single matrix usage configuration instruction is to not load data to be stored in a two-dimensional data structure. 10. The system of claim 9 , wherein the matrix operations accelerator further comprises a plurality of data buffers to buffer matrix data in two-dimensional data structures. 11. The system of claim 10 , wherein the computational grid is to house at least one of the buffered matrix data from the plurality of data buffers during a matrix manipulation operation. 12. The system of claim 10 , wherein the data buffers are a plurality of registers. 13. The system of claim 12 , wherein the plurality of registers are a plurality of packed data registers and the two-dimensional data structures are overlaid on at least two of the plurality of packed data registers. 14. The system of claim 12 , wherein the two-dimensional data structures are to be configured to use a plurality of packed data registers and memory. 15. The system of claim 9 , wherein the matrix operations accelerator comprises a plurality of chained fused multiply add circuits. 16. The system of claim 15 , wherein each of the chained fused multiply add circuits is to include storage for a portion of a two-dimensional data structure that the fused multiply add circuit is to operate on. 17. The system of claim 9 , further comprising a coherent memory interface coupled to the matrix operations accelerator and host processor to provide access to shared memory between the host processor and matrix operations accelerator. 18. The apparatus of claim 1 , wherein the to be loaded configuration is to be loaded in response to a tile configuration instruction. 19. The apparatus of claim 1 , wherein the to be loaded configuration is to include restart information.

Assignees

Inventors

Classifications

  • Image or video data · CPC title

  • Vector or matrix data · CPC title

  • Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title

  • with multidimensional access, e.g. row/column, matrix · CPC title

  • Recovery, e.g. branch miss-prediction, exception handling (error detection or correction G06F11/00) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12106100B2 cover?
Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address; and execution circuitry to execute the decoded instruction to set a tile configuration for the processor to utilize tiles in matrix operations based on a description retrieved from the memory address, wherein a tile a set of 2-d…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/30036. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 01 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).