Systems, methods, and apparatuses for tile store

US12536020B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12536020-B2
Application numberUS-202418432317-A
CountryUS
Kind codeB2
Filing dateFeb 5, 2024
Priority dateMar 20, 2017
Publication dateJan 27, 2026
Grant dateJan 27, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in at least a form of decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and destination memory information, and execution circuitry to execute the decoded instruction to store each data element of configured rows of the identified source matrix operand to memory based on the destination memory information.

First claim

Opening claim text (preview).

We claim: 1 . A processor comprising: a register to store a configuration value for tiles; decode circuitry to decode an instruction, the instruction having a field to identify a plurality of non-consecutive vector registers, a field to identify a register to store a base, and a field to identify a register to store an index, wherein the plurality of non-consecutive vector registers are to store 64-bit data elements, and wherein the plurality of non-consecutive vector registers are to have a single-instruction, multiple-data (SIMD) dimension based on the configuration value for the tiles; and execution circuitry coupled with the decode circuitry, the execution circuitry to perform operations corresponding to the instruction, including to store the 64-bit data elements from the plurality of non-consecutive vector registers to destination memory locations generated using the base and the index. 2 . The processor of claim 1 , wherein the plurality of non-consecutive vector registers is a configured number of non-consecutive vector registers. 3 . The processor of claim 1 , wherein the plurality of non-consecutive vector registers is four non-consecutive vector registers. 4 . The processor of claim 1 , further comprising a register to indicate whether the tiles are configured for use. 5 . The processor of claim 1 , wherein the decode circuitry is to decode a second instruction, and further comprising execution circuitry to perform operations corresponding to the second instruction, including to configure the tiles for use. 6 . The processor of claim 1 , wherein the instruction is to indicate a stride. 7 . The processor of claim 6 , wherein the plurality of non-consecutive vector registers correspond to different rows delineated based on the stride. 8 . The processor of claim 1 , wherein the processor has a reduced instruction set computing (RISC) architecture. 9 . The processor of claim 1 , wherein the processor is a central processing unit (CPU), and wherein the CPU further comprises a reorder buffer and register renaming circuitry. 10 . The processor of claim 1 , further comprising a register to indicate whether the tiles are configured for use, wherein the processor is a central processing unit (CPU), wherein the CPU further comprises a reorder buffer and register renaming circuitry, and wherein the plurality of non-consecutive vector registers is four non-consecutive vector registers. 11 . The processor of claim 10 , wherein the plurality of non-consecutive vector registers is a configured number of non-consecutive vector registers, and wherein the instruction is to indicate a stride. 12 . The processor of claim 11 , wherein the decode circuitry is to decode a second instruction, and further comprising execution circuitry to perform operations corresponding to the second instruction, including to configure the tiles for use, and wherein the plurality of non-consecutive vector registers correspond to different rows delineated based on the stride. 13 . The processor of claim 10 , wherein the decode circuitry is to decode a second instruction, and further comprising execution circuitry to perform operations corresponding to the second instruction, including to configure the tiles for use, and wherein the plurality of non-consecutive vector registers correspond to different rows delineated based on a stride indicated by the instruction. 14 . A method comprising: storing a configuration value for tiles in a register; decoding an instruction, the instruction having a field identifying a plurality of non-consecutive vector registers, a field identifying a register to store a base, and a field identifying a register to store an index, wherein the plurality of non-consecutive vector registers store 64-bit data elements, and wherein the plurality of non-consecutive vector registers have a single-instruction, multiple-data (SIMD) dimension based on the configuration value for the tiles; and performing operations corresponding to the instruction, including storing the 64-bit data elements from the plurality of non-consecutive vector registers to destination memory locations generated using the base and the index. 15 . The method of claim 14 , wherein storing comprises storing the 64-bit data elements from four non-consecutive vector registers to the destination memory locations generated using the base and the index, and further comprising accessing a register to determine whether the tiles are configured for use. 16 . The method of claim 14 , further comprising: configuring a number of the plurality of non-consecutive vector registers; decoding a second instruction; and performing operations corresponding to the second instruction, including configuring the tiles for use. 17 . The method of claim 14 , further comprising determining a stride from the instruction. 18 . A system on a chip (SoC) comprising: a memory controller; and a processor coupled with the memory controller, the processor comprising: a register to store a configuration value for tiles; decode circuitry to decode an instruction, the instruction having a field to identify a plurality of non-consecutive vector registers, a field to identify a register to store a base, and a field to identify a register to store an index, wherein the plurality of non-consecutive vector registers are to store 64-bit data elements, and wherein the plurality of non-consecutive vector registers are to have a single-instruction, multiple-data (SIMD) dimension based on the configuration value for the tiles; and execution circuitry coupled with the decode circuitry, the execution circuitry to perform operations corresponding to the instruction, including to store the 64-bit data elements from the plurality of non-consecutive vector registers to destination memory locations generated using the base and the index. 19 . The SoC of claim 18 , wherein the plurality of non-consecutive vector registers is a configured number of non-consecutive vector registers, and further comprising a register to indicate whether the tiles are configured for use. 20 . The SoC of claim 18 , wherein the plurality of non-consecutive vector registers is four non-consecutive vector registers, and wherein the decode circuitry is to decode a second instruction, and further comprising execution circuitry to perform operations corresponding to the second instruction, including to configure the tiles for use. 21 . The SoC of claim 18 , wherein the instruction is to indicate a stride, and wherein the plurality of non-consecutive vector registers correspond to different rows delineated based on the stride. 22 . The SoC of claim 18 , further comprising a register to indicate whether the tiles are configured for use, wherein the processor is a central processing unit (CPU), wherein the CPU further comprises a reorder buffer and register renaming circuitry, and wherein the plurality of non-consecutive vector registers is four non-consecutive vector registers. 23 . A non-transitory machine-readable storage medium storing instructions that, when executed by a machine, are to cause the machine to perform operations, including to: store a configuration value for tiles in a register; decode an instruction, the instruction having a field to identify a plurality of non-consecutive vector registers, a field to identify a register to store a base, and a field to identify a register to store an index, wherein the plurality

Assignees

Inventors

Classifications

  • Vector or matrix data · CPC title

  • Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution · CPC title

  • Decoding for concurrent execution · CPC title

  • using decoder, e.g. decoder per instruction set, adaptable or programmable decoders · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12536020B2 cover?
Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in at least a form of decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and destination memory information, and execution circuitry to execute the decoded instructi…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/3001. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 27 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).