What technology area does this patent fall under?

Primary CPC classification G06F9/30036. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 31 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems, methods, and apparatuses for tile load, multiplication and accumulation

US12182571B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12182571-B2
Application number	US-202318100194-A
Country	US
Kind code	B2
Filing date	Jan 23, 2023
Priority date	Mar 20, 2017
Publication date	Dec 31, 2024
Grant date	Dec 31, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in the form of decode circuitry to decode an instruction having fields for an opcode, a destination matrix operand identifier, and source memory information, and execution circuitry to execute the decoded instruction to load groups of strided data elements from memory into configured rows of the identified destination matrix operand to memory.

First claim

Opening claim text (preview).

We claim: 1. An apparatus comprising: a memory interface; matrix processing circuitry coupled to a memory via the memory interface, the matrix processing circuitry to execute instructions to perform matrix multiplication operations with a first source matrix comprising a first plurality of data elements and a second source matrix comprising a second plurality of data elements, wherein the first source matrix comprises a first plurality of matrix tiles and the second source matrix comprises a second plurality of matrix tiles, each matrix tile in the first plurality of matrix tiles comprising a subset of non-overlapping data elements of the first plurality of data elements and each matrix tile in the second plurality of matrix tiles comprising a subset of non-overlapping data elements of the second plurality of data elements; and a first plurality of vector registers to store a first tile comprising a first subset of non-overlapping data elements of the first plurality of data elements, a second plurality of vector registers to store a second tile comprising a second subset of non-overlapping data elements of the second plurality of data elements, and a third plurality of vector registers to store a result matrix tile comprising a plurality of result data elements; the matrix processing circuitry to multiply each data element of the first subset of non-overlapping data elements with a corresponding data element of the second subset of non-overlapping data elements to generate a corresponding plurality of products, and to add one or more products of the corresponding plurality of products to a corresponding accumulation data element to generate a corresponding result data element of the plurality of result data elements of the result matrix tile, wherein tile usage of the matrix processing circuitry is to be configured by an execution of a configuration instruction prior to the matrix processing circuitry to multiply each data element of the first subset of non-overlapping data elements with a corresponding data element of the second subset of non-overlapping data elements to generate a corresponding plurality of products, and to add one or more products of the corresponding plurality of products to a corresponding accumulation data element to generate a corresponding result data element of the plurality of result data elements of the result matrix tile, wherein tile usage at least includes to configure the matrix processing circuitry to handle particular tile dimensions as determined from a configuration accessed by the execution of the configuration instruction. 2. The apparatus of claim 1 wherein the matrix processing circuitry is to execute one or more load instructions to load the first subset of non-overlapping data elements of the first plurality of data elements from memory into the first plurality of vector registers and to load the second subset of non-overlapping data elements of the second plurality of data elements from memory into the second plurality of vector registers. 3. The apparatus of claim 2 further comprising: decode circuitry to decode the one or more load instructions to load the first subset of non-overlapping data elements of the first plurality of data elements from memory into the first plurality of vector registers and to load the second subset of non-overlapping data elements of the second plurality of data elements from memory into the second plurality of vector registers, each load instruction including a first operand to specify a corresponding subset of the first or second plurality of vector registers. 4. The apparatus of claim 2 , wherein the one or more load instructions are to load 64-bit data elements from memory locations generated using a base and an index into the first plurality of vector registers and the second plurality of vector registers. 5. The apparatus of claim 1 wherein the first subset of non-overlapping data elements of the first plurality of data elements are to be stored in the first plurality of vector registers in column-major order and the second subset of non-overlapping data elements of the second plurality of data elements are to be stored in the second plurality of vector registers in row-major order. 6. The apparatus of claim 1 wherein each data element of the first subset of non-overlapping data elements and each data element of the second subset of non-overlapping data elements comprises a first size and each data element of the result data elements comprises a second size which is at least twice the first size. 7. The apparatus of claim 6 wherein the first and second sizes are specified in at least one opcode executed by the matrix processing circuitry. 8. The apparatus of claim 7 , wherein the size of each data element of the result data elements is a doubleword. 9. The apparatus of claim 8 , wherein the size of each data element of the first subset of non-overlapping data elements and each data element of the second subset of non-overlapping data elements comprises a word. 10. The apparatus of claim 9 wherein each data element of the first subset of non-overlapping data elements and each data element of the second subset of non-overlapping data elements comprises a half-precision floating-point value. 11. A system comprising: a memory interface; a plurality of cores coupled to the memory interface, one or more cores of the plurality of cores to execute program code to schedule matrix multiplication operations; matrix processing circuitry coupled to a memory via the memory interface, the matrix processing circuitry to execute instructions to perform the matrix multiplication operations with a first source matrix comprising a first plurality of data elements and a second source matrix comprising a second plurality of data elements, wherein the first source matrix comprises a first plurality of matrix tiles and the second source matrix comprises a second plurality of matrix tiles, each matrix tile in the first plurality of matrix tiles comprising a subset of non-overlapping data elements of the first plurality of data elements and each matrix tile in the second plurality of matrix tiles comprising a subset of non-overlapping data elements of the second plurality of data elements; and a first plurality of vector registers to store a first tile comprising a first subset of non-overlapping data elements of the first plurality of data elements, a second plurality of vector registers to store a second tile comprising a second subset of non-overlapping data elements of the second plurality of data elements, and a third plurality of vector registers to store a result matrix tile comprising a plurality of result data elements; the matrix processing circuitry to multiply each data element of the first subset of non-overlapping data elements with a corresponding data element of the second subset of non-overlapping data elements to generate a corresponding plurality of products, and to add one or more products of the corresponding plurality of products to a corresponding accumulation data element to generate a corresponding result data element of the plurality of result data elements of the result matrix tile, wherein tile usage of the matrix processing circuitry is to be configured by an execution of a configuration instruction prior to the matrix processing circuitry to multiply each data element of the first subset of non-overlapping data elements with a corresponding data element of the second subset of non-overlapping data elements to generate a corresponding plurality of products, and to add one or more products of the corresponding plurality of products to a corresponding accumulation data element to generate a corresponding result data element

Assignees

Intel Corp

Inventors

Classifications

G06F2212/455
Image or video data · CPC title
G06F2212/454
Vector or matrix data · CPC title
G06F7/5443
Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title
G06F12/0207
with multidimensional access, e.g. row/column, matrix · CPC title
G06F9/3861
Recovery, e.g. branch miss-prediction, exception handling (error detection or correction G06F11/00) · CPC title

Patent family

Related publications grouped by family.

View patent family 63584598

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12182571B2 cover?: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in the form of decode circuitry to decode an instruction having fields for an opcode, a destination matrix operand identifier, and source memory information, and execution circuitry to execute the decoded instruction to l…
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification G06F9/30036. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 31 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).