Double rounded combined floating-point multiply and add

US9778909B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9778909-B2
Application numberUS-201615332721-A
CountryUS
Kind codeB2
Filing dateOct 24, 2016
Priority dateJun 29, 2012
Publication dateOct 3, 2017
Grant dateOct 3, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, apparatus, instructions and logic are disclosed providing double rounded combined floating-point multiply and add functionality as scalar or vector SIMD instructions or as fused micro-operations. Embodiments include detecting floating-point (FP) multiplication operations and subsequent FP operations specifying as source operands results of the FP multiplications. The FP multiplications and the subsequent FP operations are encoded as combined FP operations including rounding of the results of FP multiplication followed by the subsequent FP operations. The encoding of said combined FP operations may be stored and executed as part of an executable thread portion using fused-multiply-add hardware that includes overflow detection for the product of FP multipliers, first and second FP adders to add third operand addend mantissas and the products of the FP multipliers with different rounding inputs based on overflow, or no overflow, in the products of the FP multiplier. Final results are selected respectively using overflow detection.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a floating-point (FP) multiplier circuit to multiply a first operand multiplicand mantissa by a second operand multiplier mantissa to generate a product; a FP alignment circuit to align a third operand mantissa according to the product of the FP multiplier circuit; an overflow detection circuit to detect an overflow condition in the product of the FP multiplier circuit; a first FP adder circuit to add together the aligned third operand mantissa and the product of the FP multiplier circuit using a first rounding input to generate a first sum or difference based on an assumption that the overflow condition in the product of the FP multiplier circuit was not detected; a second FP adder circuit to add together the aligned third operand mantissa and the product of the FP multiplier circuit using a second rounding input to generate a second sum or difference based on an assumption that the overflow condition in the product of the FP multiplier circuit was detected; and a multiplexer circuit to select between the second sum or difference and the first sum or difference based on the overflow detection circuit detecting or not detecting the overflow condition, respectively, in the product of the FP multiplier circuit. 2. The apparatus of claim 1 , wherein the first operand multiplicand, the second operand multiplier, and the third operand are single instruction multiple data (SIMD) vector registers. 3. The apparatus of claim 2 , wherein data elements of the first operand multiplicand, the second operand multiplier, and the third operand are 64-bit FP data elements. 4. The apparatus of claim 2 , wherein data elements of the first operand multiplicand, the second operand multiplier, and the third operand are either 32-bit FP data elements or 16-bit FP data elements. 5. The apparatus of claim 1 , wherein the first operand multiplicand, the second operand multiplier, and the third operand are scalar FP registers. 6. The apparatus of claim 5 , wherein the scalar FP registers are on a FP stack. 7. A processor comprising: one or more vector registers each comprising a plurality of data fields to store values of vector elements; a decode stage to decode a single instruction multiple data (SIMD) double-rounded combined floating-point (FP) multiply-add or multiply-subtract instruction specifying: a destination operand of the one or more vector registers, a first operand multiplicand of the one or more vector registers, a size of the vector elements, a second operand multiplier of the one or more vector registers, and a third operand of the one or more vector registers; a SIMD FP multiply-adder comprising: a floating-point (FP) multiplier stage to multiply a plurality of mantissas of the first operand multiplicand with a plurality of respective mantissas of the second operand multiplier to generate a plurality of respective products; a FP alignment stage to align a plurality of respective mantissas of the third operand according to the respective products of the FP multiplier stage; an overflow detection stage to detect overflow conditions in the respective products of the FP multiplier stage; a first FP adder stage to add together the plurality of respective aligned mantissas of the third operand and the respective products of the FP multiplier stage using a first set of rounding inputs to generate a first plurality of respective sums or differences based on an assumption that overflow conditions in the respective products of the FP multiplier circuit were not detected; a second FP adder stage to add together the plurality of respective aligned mantissas of the third operand and the respective products of the FP multiplier stage using a second set of rounding inputs to generate a second plurality of respective sums or differences based on an assumption that overflow conditions in the respective products of the FP multiplier circuit were detected; and a multiplexer stage to select between the second sums or differences and the first sums or differences, of the respective pluralities, based on the overflow detection stage detecting or not detecting respective overflow conditions in the respective products of the FP multiplier stage. 8. The processor of claim 7 , wherein data elements of the first operand multiplicand, the second operand multiplier, and the third operand are 64-bit FP data elements. 9. The processor of claim 7 , wherein data elements of the first operand multiplicand, the second operand multiplier, and the third operand are 32-bit FP data elements. 10. The processor of claim 7 , wherein said double-rounded combined floating-point (FP) multiply-add or multiply-subtract instruction is generated by processor instruction set architecture (ISA) translation logic. 11. The processor of claim 10 , wherein the double-rounded combined FP multiply-add or multiply-subtract instruction is stored as an ISA macro-instruction in an instruction cache.

Assignees

Inventors

Classifications

  • Special implementations · CPC title

  • G06F7/483Primary

    Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers {(G06F7/4806, G06F7/4824, G06F7/49, G06F7/491, G06F7/544 take precedence)} · CPC title

  • Mantissa overflow or underflow in handling floating-point numbers · CPC title

  • G06F7/4876Primary

    Multiplying · CPC title

  • Overflow or underflow · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9778909B2 cover?
Methods, apparatus, instructions and logic are disclosed providing double rounded combined floating-point multiply and add functionality as scalar or vector SIMD instructions or as fused micro-operations. Embodiments include detecting floating-point (FP) multiplication operations and subsequent FP operations specifying as source operands results of the FP multiplications. The FP multiplications…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F7/483. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 03 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).