Performing traversal stack compression
US-2018373809-A1 · Dec 27, 2018 · US
US2025045022A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025045022-A1 |
| Application number | US-202418927764-A |
| Country | US |
| Kind code | A1 |
| Filing date | Oct 25, 2024 |
| Priority date | Dec 23, 2020 |
| Publication date | Feb 6, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Complex matrix transpose and multiply operations are described. One embodiment comprises: a decoder multiplication and transpose instructions; execution circuitry to execute a first complex matrix transpose and multiplication instruction, the execution circuitry comprising transpose hardware logic to transpose at least one source matrix, parallel multiplication circuitry to multiply real values from a first plurality of real and imaginary values with corresponding real values from the second plurality of real and imaginary values to generate a first plurality of real products, to multiply imaginary values from the first plurality of real and imaginary values with corresponding imaginary values from a second plurality of real and imaginary values to generate a second plurality of real products; and addition/subtraction circuitry to subtract each real product in the second plurality of real products from a corresponding real product in the first plurality to produce a corresponding real value in a result matrix.
Opening claim text (preview).
What is claimed is: 1 . A system, comprising: a memory controller to couple to a system memory; a processor coupled to the memory controller, the processor comprising a plurality of cores to process a first plurality of instructions; a matrix operations accelerator coupled to the memory controller, the matrix operations accelerator to process a second plurality of instructions different from the first plurality of instructions, the matrix operations accelerator comprising: a decoder configured to decode a first instruction and a second instruction, the first instruction including a first source operand to indicate a first complex source matrix comprising a first plurality of complex values, a second source operand to indicate a second complex source matrix comprising a second plurality of complex values, and a first destination operand to identify a real result matrix, the second instruction including a third source operand to identify the first complex source matrix, a fourth source operand to identify the second complex source matrix, and a second destination operand to identify locations of imaginary values in an imaginary result matrix; execution circuitry configured to execute the first and second instructions, the execution circuitry comprising: circuitry to transpose the first complex source matrix to generate a transposed complex matrix comprising the first plurality of complex values, each complex value comprising a real component and an imaginary component; parallel multiplication circuitry to, in parallel: multiply real values from the first plurality of complex values with corresponding real values from the second plurality of complex values to generate a first plurality of real products, multiply imaginary values from the first plurality of complex values with corresponding imaginary values from the second plurality of complex values to generate a second plurality of real products, multiply imaginary values from the first plurality of complex values from the transposed complex matrix with corresponding real values from the second plurality of complex values to generate a first plurality of imaginary products, and multiply the real values from the first plurality of complex values from the transposed complex matrix with corresponding imaginary values from the second plurality of complex values to generate a second plurality of imaginary products; and addition/subtraction circuitry to, in parallel: subtract each real product in the second plurality of real products from a corresponding real product in the first plurality of real products to produce a corresponding real value in the real result matrix, and add each imaginary product in the first plurality of imaginary products and a corresponding imaginary product in the second plurality of imaginary products to produce a corresponding imaginary value in the imaginary result matrix. 2 . The system of claim 1 , wherein transposing the first complex source matrix comprises switching indices of the rows and columns of the first complex source matrix. 3 . The system of claim 1 , further comprising: a storage device to store program code including the first and second plurality of instructions; an interconnect to couple the storage device to the memory controller; an input-output (IO) interface to couple IO devices to the interconnect; and a network interface coupled to the interconnect to provide communication over a network. 4 . The system of claim 1 , wherein the complex values in the first plurality of complex values and the second plurality of complex values comprise 32-bit floating-point values, each with a 16-bit real component and a 16-bit imaginary component, and wherein the first and second result matrices comprise 32-bit floating point values. 5 . The system of claim 4 , wherein the execution circuitry is to convert each 16-bit real component to a 32-bit real value and is to convert each 16-bit imaginary component to a 32-bit real value, wherein the parallel multiplication circuitry is to: multiply 32-bit real values converted from the first plurality of complex values with corresponding 32-bit real values converted from the second plurality of complex values to generate a first plurality of real products, multiply 32-bit imaginary values converted from the first plurality of complex values with corresponding 32-bit imaginary values converted from the second plurality of complex values to generate a second plurality of real products, multiply 32-bit imaginary values converted from the first plurality of complex values from the transposed complex matrix with corresponding 32-bit real values converted from the second plurality of complex values to generate a first plurality of imaginary products, and multiply the 32-bit real values converted from the first plurality of complex values from the transposed complex matrix with corresponding 32-bit imaginary values converted from the second plurality of complex values to generate a second plurality of imaginary products. 6 . The system of claim 1 wherein the parallel multiplication circuitry comprises a plurality of multipliers to perform a first plurality of parallel multiplications of at least a portion of the real and imaginary values in the first source matrix with the corresponding real and imaginary values, respectively, in the second source matrix to generate the first and second plurality of real products. 7 . The system of claim 1 , wherein the plurality of multipliers are to concurrently perform a second plurality of parallel multiplications of at least a portion of the real and imaginary values in the first source matrix with the corresponding imaginary and real values, respectively, in the second source matrix to generate the first and second plurality of imaginary products. 8 . The system of claim 1 , wherein the matrix operations accelerator is integrated on a separate die in the a same package as the processor. 9 . The system of claim 1 , wherein the matrix operations accelerator is integrated on a same die as the processor. 10 . A system, comprising: a memory controller to couple to a system memory; a processor coupled to the memory controller, the processor comprising a plurality of cores to process a first plurality of instructions; a matrix operations accelerator coupled to the memory controller, the matrix operations accelerator to process a second plurality of instructions different from the first plurality of instructions, the matrix operations accelerator comprising: a decoder to decode a first instruction including a first source operand to identify a first complex source matrix comprising a first plurality of complex values, a second source operand to identify a second source matrix comprising a second plurality of complex values, and a first destination operand to identify a result matrix, each of the first and second plurality of complex values including a 16-bit real value and a 16-bit imaginary value; execution circuitry to execute the first instruction, the execution circuitry to convert each 16-bit real value to a 32-bit real value and to convert each 16-bit imaginary value to a 32-bit imaginary value, the execution circuitry comprising: circuitry to transpose the first complex source matrix to generate a transposed complex matrix comprising the first plurality of complex values; parallel multiplication circuitry to: multiply each 32-bit real value from the first plurality of complex values with a corresponding 32-bit imaginary value from the second plurality of complex values to generate a first plurality of imaginary products, and multiply each 32-bit imaginary value from the first plurality of complex values with a corresp
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
using a mask · CPC title
Instruction analysis, e.g. decoding, instruction word fields · CPC title
Arithmetic instructions · CPC title
in parallel-parallel fashion, i.e. both operands being entered in parallel (G06F7/533 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.