Apparatus and method for complex matrix transpose and multiply

US2025045022A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025045022-A1
Application numberUS-202418927764-A
CountryUS
Kind codeA1
Filing dateOct 25, 2024
Priority dateDec 23, 2020
Publication dateFeb 6, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Complex matrix transpose and multiply operations are described. One embodiment comprises: a decoder multiplication and transpose instructions; execution circuitry to execute a first complex matrix transpose and multiplication instruction, the execution circuitry comprising transpose hardware logic to transpose at least one source matrix, parallel multiplication circuitry to multiply real values from a first plurality of real and imaginary values with corresponding real values from the second plurality of real and imaginary values to generate a first plurality of real products, to multiply imaginary values from the first plurality of real and imaginary values with corresponding imaginary values from a second plurality of real and imaginary values to generate a second plurality of real products; and addition/subtraction circuitry to subtract each real product in the second plurality of real products from a corresponding real product in the first plurality to produce a corresponding real value in a result matrix.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system, comprising: a memory controller to couple to a system memory; a processor coupled to the memory controller, the processor comprising a plurality of cores to process a first plurality of instructions; a matrix operations accelerator coupled to the memory controller, the matrix operations accelerator to process a second plurality of instructions different from the first plurality of instructions, the matrix operations accelerator comprising: a decoder configured to decode a first instruction and a second instruction, the first instruction including a first source operand to indicate a first complex source matrix comprising a first plurality of complex values, a second source operand to indicate a second complex source matrix comprising a second plurality of complex values, and a first destination operand to identify a real result matrix, the second instruction including a third source operand to identify the first complex source matrix, a fourth source operand to identify the second complex source matrix, and a second destination operand to identify locations of imaginary values in an imaginary result matrix; execution circuitry configured to execute the first and second instructions, the execution circuitry comprising: circuitry to transpose the first complex source matrix to generate a transposed complex matrix comprising the first plurality of complex values, each complex value comprising a real component and an imaginary component; parallel multiplication circuitry to, in parallel: multiply real values from the first plurality of complex values with corresponding real values from the second plurality of complex values to generate a first plurality of real products, multiply imaginary values from the first plurality of complex values with corresponding imaginary values from the second plurality of complex values to generate a second plurality of real products, multiply imaginary values from the first plurality of complex values from the transposed complex matrix with corresponding real values from the second plurality of complex values to generate a first plurality of imaginary products, and multiply the real values from the first plurality of complex values from the transposed complex matrix with corresponding imaginary values from the second plurality of complex values to generate a second plurality of imaginary products; and addition/subtraction circuitry to, in parallel: subtract each real product in the second plurality of real products from a corresponding real product in the first plurality of real products to produce a corresponding real value in the real result matrix, and add each imaginary product in the first plurality of imaginary products and a corresponding imaginary product in the second plurality of imaginary products to produce a corresponding imaginary value in the imaginary result matrix. 2 . The system of claim 1 , wherein transposing the first complex source matrix comprises switching indices of the rows and columns of the first complex source matrix. 3 . The system of claim 1 , further comprising: a storage device to store program code including the first and second plurality of instructions; an interconnect to couple the storage device to the memory controller; an input-output (IO) interface to couple IO devices to the interconnect; and a network interface coupled to the interconnect to provide communication over a network. 4 . The system of claim 1 , wherein the complex values in the first plurality of complex values and the second plurality of complex values comprise 32-bit floating-point values, each with a 16-bit real component and a 16-bit imaginary component, and wherein the first and second result matrices comprise 32-bit floating point values. 5 . The system of claim 4 , wherein the execution circuitry is to convert each 16-bit real component to a 32-bit real value and is to convert each 16-bit imaginary component to a 32-bit real value, wherein the parallel multiplication circuitry is to: multiply 32-bit real values converted from the first plurality of complex values with corresponding 32-bit real values converted from the second plurality of complex values to generate a first plurality of real products, multiply 32-bit imaginary values converted from the first plurality of complex values with corresponding 32-bit imaginary values converted from the second plurality of complex values to generate a second plurality of real products, multiply 32-bit imaginary values converted from the first plurality of complex values from the transposed complex matrix with corresponding 32-bit real values converted from the second plurality of complex values to generate a first plurality of imaginary products, and multiply the 32-bit real values converted from the first plurality of complex values from the transposed complex matrix with corresponding 32-bit imaginary values converted from the second plurality of complex values to generate a second plurality of imaginary products. 6 . The system of claim 1 wherein the parallel multiplication circuitry comprises a plurality of multipliers to perform a first plurality of parallel multiplications of at least a portion of the real and imaginary values in the first source matrix with the corresponding real and imaginary values, respectively, in the second source matrix to generate the first and second plurality of real products. 7 . The system of claim 1 , wherein the plurality of multipliers are to concurrently perform a second plurality of parallel multiplications of at least a portion of the real and imaginary values in the first source matrix with the corresponding imaginary and real values, respectively, in the second source matrix to generate the first and second plurality of imaginary products. 8 . The system of claim 1 , wherein the matrix operations accelerator is integrated on a separate die in the a same package as the processor. 9 . The system of claim 1 , wherein the matrix operations accelerator is integrated on a same die as the processor. 10 . A system, comprising: a memory controller to couple to a system memory; a processor coupled to the memory controller, the processor comprising a plurality of cores to process a first plurality of instructions; a matrix operations accelerator coupled to the memory controller, the matrix operations accelerator to process a second plurality of instructions different from the first plurality of instructions, the matrix operations accelerator comprising: a decoder to decode a first instruction including a first source operand to identify a first complex source matrix comprising a first plurality of complex values, a second source operand to identify a second source matrix comprising a second plurality of complex values, and a first destination operand to identify a result matrix, each of the first and second plurality of complex values including a 16-bit real value and a 16-bit imaginary value; execution circuitry to execute the first instruction, the execution circuitry to convert each 16-bit real value to a 32-bit real value and to convert each 16-bit imaginary value to a 32-bit imaginary value, the execution circuitry comprising: circuitry to transpose the first complex source matrix to generate a transposed complex matrix comprising the first plurality of complex values; parallel multiplication circuitry to: multiply each 32-bit real value from the first plurality of complex values with a corresponding 32-bit imaginary value from the second plurality of complex values to generate a first plurality of imaginary products, and multiply each 32-bit imaginary value from the first plurality of complex values with a corresp

Assignees

Inventors

Classifications

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • using a mask · CPC title

  • Instruction analysis, e.g. decoding, instruction word fields · CPC title

  • Arithmetic instructions · CPC title

  • in parallel-parallel fashion, i.e. both operands being entered in parallel (G06F7/533 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025045022A1 cover?
Complex matrix transpose and multiply operations are described. One embodiment comprises: a decoder multiplication and transpose instructions; execution circuitry to execute a first complex matrix transpose and multiplication instruction, the execution circuitry comprising transpose hardware logic to transpose at least one source matrix, parallel multiplication circuitry to multiply real values…
Who is the assignee on this patent?
Adelman Menachem, Valentine Robert, Towner Daniel, and 2 more
What technology area does this patent fall under?
Primary CPC classification G06F7/78. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Feb 06 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).