Instructions for fused multiply-add operations with variable precision input operands
US-2019042242-A1 · Feb 7, 2019 · US
US11455142B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11455142-B2 |
| Application number | US-201916432358-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 5, 2019 |
| Priority date | Jun 5, 2019 |
| Publication date | Sep 27, 2022 |
| Grant date | Sep 27, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments for implementing a fused multiply-multiply-accumulate (“FMMA”) unit by one or more processors in a computing system. Mantissas for two products, an exponent difference of the two products serving as an alignment shift amount for a product of the two products having a smallest exponent, and an alignment shift amount for an addend relative to an alternative product of the two product having a larger exponent may be determined in parallel. The addend may be aligned relative to the alternative product having the larger exponent. The product having the smallest exponent may be aligned relative to the alternative product having the larger exponent according to the alignment shift amount.
Opening claim text (preview).
The invention claimed is: 1. A method, by one or more processors, for implementing a fused multiply-multiply-accumulate (FMMA) operation in a computing environment, comprising: receiving, by the one or more processors, an instruction stored in a memory, wherein the instruction contains at least two operands of mixed bit-precision formats; and executing the instruction, wherein, when executing the instruction, the one or more processors implement a FMMA unit to perform an internal rounding operation associated with floating point arithmetic of the instruction by performing each of: determining by multiplier circuitry within the FMMA unit, in parallel, mantissas for two products, an exponent difference of the two products serving as an alignment shift amount for a product of the two products having a smallest exponent, and an alignment shift amount for an addend relative to an alternative product of the two products having a larger exponent, wherein the mantissas are pre-shifted prior to aligning the addend and the product relative to the alternative product, and wherein the addend and the product having the smallest exponent are aligned prior to receiving a select signal indicating to a selector to select between one of the pre-shifted mantissas when performing the alignment of the addend and the product relative to the alternative product; aligning, by aligning circuitry within the FMMA unit, the addend relative to the alternative product having the larger exponent; and aligning, by the aligning circuitry, the product having the smallest exponent relative to the alternative product having the larger exponent according to the alignment shift amount for the product of the two products having the smallest exponent. 2. The method of claim 1 , further including adding or subtracting the mantissas of the two products according to a sign of the addend and the two products. 3. The method of claim 1 , further including retaining a selected number of bits while discarding an alternative number of bits of the product for aligning the product having the smallest exponent relative to the alternative product having the larger exponent. 4. The method of claim 1 , further including retaining a selected number of bits while discarding an alternative number of bits of the addend for aligning the addend relative to the alternative product having the larger exponent. 5. The method of claim 1 , further including normalizing and rounding an intermediate summation or difference of aligned mantissas for each of the two products and the aligned addend to a targeted precision. 6. The method of claim 1 , further including: performing a mixed-precision FMMA operation by using one or more inputs, one or more outputs, or a combination thereof in a selected format; or performing a hybrid-fused FMMA operation by enabling a very low precision format (VLP) operand to use a plurality of formats. 7. The method of claim 1 , wherein the FMMA unit implements both a half-precision fused multiple add (FMA) operation and a very low precision format (VLP) FMMA operation, wherein the VLP is a format using less than sixteen bits comprising a sign bit, exponent bits (e), and mantissa bits (m), and the FMMA unit is selectively configured to perform the FMA operation or the FMMA operation. 8. A system for implementing a fused multiply-multiply-accumulate (FMMA) operation in a computing environment, comprising: one or more hardware memory storing executable instructions; one or more hardware processors; and a FMMA unit implemented within the one or more hardware processors, wherein the one or more hardware processors are configured to: receive, by the one or more hardware processors, one of the executable instructions stored in the one or more memory, wherein the instruction contains at least two operands of mixed bit-precision formats; and execute the one of the executable instructions by implementing the FMMA unit to perform an internal rounding operation associated with floating point arithmetic performing each of: determining by multiplier circuitry within the FMMA unit, in parallel, mantissas for two products, an exponent difference of the two products serving as an alignment shift amount for a product of the two products having a smallest exponent, and an alignment shift amount for an addend relative to an alternative product of the two products having a larger exponent, wherein the mantissas are pre-shifted prior to aligning the addend and the product relative to the alternative product, and wherein the addend and the product having the smallest exponent are aligned prior to receiving a select signal indicating to a selector to select between one of the pre-shifted mantissas when performing the alignment of the addend and the product relative to the alternative product; aligning, by aligning circuitry within the FMMA unit, the addend relative to the alternative product having the larger exponent; and aligning, by the aligning circuitry, the product having the smallest exponent relative to the alternative product having the larger exponent according to the alignment shift amount for the product of the two products having the smallest exponent. 9. The system of claim 8 , wherein the executable instructions further add or subtract the mantissas of the two products according to a sign of the addend and the two products. 10. The system of claim 8 , wherein the executable instructions further retain a selected number of bits while discarding an alternative number of bits of the product for aligning the product having the smallest exponent relative to the alternative product having the larger exponent. 11. The system of claim 8 , wherein the executable instructions further retain a selected number of bits while discarding an alternative number of bits of the addend for aligning the addend relative to the alternative product having the larger exponent. 12. The system of claim 8 , wherein the executable instructions further normalize and round an intermediate summation or difference of aligned mantissas for each of the two products and the aligned addend to a targeted precision. 13. The system of claim 8 , wherein the executable instructions further: perform a mixed-precision FMMA operation by using one or more inputs, one or more outputs, or a combination thereof in a selected format; or perform a hybrid-fused FMMA operation by enabling a very low precision format (VLP) operand to use a plurality of formats. 14. The system of claim 8 , wherein the FMMA unit implements both a half-precision fused multiple add (FMA) operation and a very low precision format (VLP) FMMA operation, wherein the VLP is a format using less than sixteen bits comprising a sign bit, exponent bits (e), and mantissa bits (m), and the FMMA unit is selectively configured to perform the FMA operation or the FMMA operation. 15. A computer program product for, by a processor, implementing a fused multiply-multiply-accumulate (FMMA) operation in a computing environment, the computer program product comprising a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising: an executable portion that receives, by the processor, an instruction stored in a memory, wherein the instruction contains at least two operands of mixed bit-precision formats; and an executable portion that executes the instruction, wherein, when executing the instruction, the one or more processors implement a FMMA unit to perform an internal rounding operation associated with floating point arithmetic of the instruc
Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title
Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers {(G06F7/4806, G06F7/4824, G06F7/49, G06F7/491, G06F7/544 take precedence)} · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.