Outer product-based matrix-vector multiplication operation apparatus for accelerating vector operation and method using the same
US-2024362297-A1 · Oct 31, 2024 · US
US9778909B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9778909-B2 |
| Application number | US-201615332721-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 24, 2016 |
| Priority date | Jun 29, 2012 |
| Publication date | Oct 3, 2017 |
| Grant date | Oct 3, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, apparatus, instructions and logic are disclosed providing double rounded combined floating-point multiply and add functionality as scalar or vector SIMD instructions or as fused micro-operations. Embodiments include detecting floating-point (FP) multiplication operations and subsequent FP operations specifying as source operands results of the FP multiplications. The FP multiplications and the subsequent FP operations are encoded as combined FP operations including rounding of the results of FP multiplication followed by the subsequent FP operations. The encoding of said combined FP operations may be stored and executed as part of an executable thread portion using fused-multiply-add hardware that includes overflow detection for the product of FP multipliers, first and second FP adders to add third operand addend mantissas and the products of the FP multipliers with different rounding inputs based on overflow, or no overflow, in the products of the FP multiplier. Final results are selected respectively using overflow detection.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: a floating-point (FP) multiplier circuit to multiply a first operand multiplicand mantissa by a second operand multiplier mantissa to generate a product; a FP alignment circuit to align a third operand mantissa according to the product of the FP multiplier circuit; an overflow detection circuit to detect an overflow condition in the product of the FP multiplier circuit; a first FP adder circuit to add together the aligned third operand mantissa and the product of the FP multiplier circuit using a first rounding input to generate a first sum or difference based on an assumption that the overflow condition in the product of the FP multiplier circuit was not detected; a second FP adder circuit to add together the aligned third operand mantissa and the product of the FP multiplier circuit using a second rounding input to generate a second sum or difference based on an assumption that the overflow condition in the product of the FP multiplier circuit was detected; and a multiplexer circuit to select between the second sum or difference and the first sum or difference based on the overflow detection circuit detecting or not detecting the overflow condition, respectively, in the product of the FP multiplier circuit. 2. The apparatus of claim 1 , wherein the first operand multiplicand, the second operand multiplier, and the third operand are single instruction multiple data (SIMD) vector registers. 3. The apparatus of claim 2 , wherein data elements of the first operand multiplicand, the second operand multiplier, and the third operand are 64-bit FP data elements. 4. The apparatus of claim 2 , wherein data elements of the first operand multiplicand, the second operand multiplier, and the third operand are either 32-bit FP data elements or 16-bit FP data elements. 5. The apparatus of claim 1 , wherein the first operand multiplicand, the second operand multiplier, and the third operand are scalar FP registers. 6. The apparatus of claim 5 , wherein the scalar FP registers are on a FP stack. 7. A processor comprising: one or more vector registers each comprising a plurality of data fields to store values of vector elements; a decode stage to decode a single instruction multiple data (SIMD) double-rounded combined floating-point (FP) multiply-add or multiply-subtract instruction specifying: a destination operand of the one or more vector registers, a first operand multiplicand of the one or more vector registers, a size of the vector elements, a second operand multiplier of the one or more vector registers, and a third operand of the one or more vector registers; a SIMD FP multiply-adder comprising: a floating-point (FP) multiplier stage to multiply a plurality of mantissas of the first operand multiplicand with a plurality of respective mantissas of the second operand multiplier to generate a plurality of respective products; a FP alignment stage to align a plurality of respective mantissas of the third operand according to the respective products of the FP multiplier stage; an overflow detection stage to detect overflow conditions in the respective products of the FP multiplier stage; a first FP adder stage to add together the plurality of respective aligned mantissas of the third operand and the respective products of the FP multiplier stage using a first set of rounding inputs to generate a first plurality of respective sums or differences based on an assumption that overflow conditions in the respective products of the FP multiplier circuit were not detected; a second FP adder stage to add together the plurality of respective aligned mantissas of the third operand and the respective products of the FP multiplier stage using a second set of rounding inputs to generate a second plurality of respective sums or differences based on an assumption that overflow conditions in the respective products of the FP multiplier circuit were detected; and a multiplexer stage to select between the second sums or differences and the first sums or differences, of the respective pluralities, based on the overflow detection stage detecting or not detecting respective overflow conditions in the respective products of the FP multiplier stage. 8. The processor of claim 7 , wherein data elements of the first operand multiplicand, the second operand multiplier, and the third operand are 64-bit FP data elements. 9. The processor of claim 7 , wherein data elements of the first operand multiplicand, the second operand multiplier, and the third operand are 32-bit FP data elements. 10. The processor of claim 7 , wherein said double-rounded combined floating-point (FP) multiply-add or multiply-subtract instruction is generated by processor instruction set architecture (ISA) translation logic. 11. The processor of claim 10 , wherein the double-rounded combined FP multiply-add or multiply-subtract instruction is stored as an ISA macro-instruction in an instruction cache.
Special implementations · CPC title
Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers {(G06F7/4806, G06F7/4824, G06F7/49, G06F7/491, G06F7/544 take precedence)} · CPC title
Mantissa overflow or underflow in handling floating-point numbers · CPC title
Multiplying · CPC title
Overflow or underflow · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.