Ultra-low precision floating-point fused multiply-accumulate unit

US11455142B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11455142-B2
Application numberUS-201916432358-A
CountryUS
Kind codeB2
Filing dateJun 5, 2019
Priority dateJun 5, 2019
Publication dateSep 27, 2022
Grant dateSep 27, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments for implementing a fused multiply-multiply-accumulate (“FMMA”) unit by one or more processors in a computing system. Mantissas for two products, an exponent difference of the two products serving as an alignment shift amount for a product of the two products having a smallest exponent, and an alignment shift amount for an addend relative to an alternative product of the two product having a larger exponent may be determined in parallel. The addend may be aligned relative to the alternative product having the larger exponent. The product having the smallest exponent may be aligned relative to the alternative product having the larger exponent according to the alignment shift amount.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method, by one or more processors, for implementing a fused multiply-multiply-accumulate (FMMA) operation in a computing environment, comprising: receiving, by the one or more processors, an instruction stored in a memory, wherein the instruction contains at least two operands of mixed bit-precision formats; and executing the instruction, wherein, when executing the instruction, the one or more processors implement a FMMA unit to perform an internal rounding operation associated with floating point arithmetic of the instruction by performing each of: determining by multiplier circuitry within the FMMA unit, in parallel, mantissas for two products, an exponent difference of the two products serving as an alignment shift amount for a product of the two products having a smallest exponent, and an alignment shift amount for an addend relative to an alternative product of the two products having a larger exponent, wherein the mantissas are pre-shifted prior to aligning the addend and the product relative to the alternative product, and wherein the addend and the product having the smallest exponent are aligned prior to receiving a select signal indicating to a selector to select between one of the pre-shifted mantissas when performing the alignment of the addend and the product relative to the alternative product; aligning, by aligning circuitry within the FMMA unit, the addend relative to the alternative product having the larger exponent; and aligning, by the aligning circuitry, the product having the smallest exponent relative to the alternative product having the larger exponent according to the alignment shift amount for the product of the two products having the smallest exponent. 2. The method of claim 1 , further including adding or subtracting the mantissas of the two products according to a sign of the addend and the two products. 3. The method of claim 1 , further including retaining a selected number of bits while discarding an alternative number of bits of the product for aligning the product having the smallest exponent relative to the alternative product having the larger exponent. 4. The method of claim 1 , further including retaining a selected number of bits while discarding an alternative number of bits of the addend for aligning the addend relative to the alternative product having the larger exponent. 5. The method of claim 1 , further including normalizing and rounding an intermediate summation or difference of aligned mantissas for each of the two products and the aligned addend to a targeted precision. 6. The method of claim 1 , further including: performing a mixed-precision FMMA operation by using one or more inputs, one or more outputs, or a combination thereof in a selected format; or performing a hybrid-fused FMMA operation by enabling a very low precision format (VLP) operand to use a plurality of formats. 7. The method of claim 1 , wherein the FMMA unit implements both a half-precision fused multiple add (FMA) operation and a very low precision format (VLP) FMMA operation, wherein the VLP is a format using less than sixteen bits comprising a sign bit, exponent bits (e), and mantissa bits (m), and the FMMA unit is selectively configured to perform the FMA operation or the FMMA operation. 8. A system for implementing a fused multiply-multiply-accumulate (FMMA) operation in a computing environment, comprising: one or more hardware memory storing executable instructions; one or more hardware processors; and a FMMA unit implemented within the one or more hardware processors, wherein the one or more hardware processors are configured to: receive, by the one or more hardware processors, one of the executable instructions stored in the one or more memory, wherein the instruction contains at least two operands of mixed bit-precision formats; and execute the one of the executable instructions by implementing the FMMA unit to perform an internal rounding operation associated with floating point arithmetic performing each of: determining by multiplier circuitry within the FMMA unit, in parallel, mantissas for two products, an exponent difference of the two products serving as an alignment shift amount for a product of the two products having a smallest exponent, and an alignment shift amount for an addend relative to an alternative product of the two products having a larger exponent, wherein the mantissas are pre-shifted prior to aligning the addend and the product relative to the alternative product, and wherein the addend and the product having the smallest exponent are aligned prior to receiving a select signal indicating to a selector to select between one of the pre-shifted mantissas when performing the alignment of the addend and the product relative to the alternative product; aligning, by aligning circuitry within the FMMA unit, the addend relative to the alternative product having the larger exponent; and aligning, by the aligning circuitry, the product having the smallest exponent relative to the alternative product having the larger exponent according to the alignment shift amount for the product of the two products having the smallest exponent. 9. The system of claim 8 , wherein the executable instructions further add or subtract the mantissas of the two products according to a sign of the addend and the two products. 10. The system of claim 8 , wherein the executable instructions further retain a selected number of bits while discarding an alternative number of bits of the product for aligning the product having the smallest exponent relative to the alternative product having the larger exponent. 11. The system of claim 8 , wherein the executable instructions further retain a selected number of bits while discarding an alternative number of bits of the addend for aligning the addend relative to the alternative product having the larger exponent. 12. The system of claim 8 , wherein the executable instructions further normalize and round an intermediate summation or difference of aligned mantissas for each of the two products and the aligned addend to a targeted precision. 13. The system of claim 8 , wherein the executable instructions further: perform a mixed-precision FMMA operation by using one or more inputs, one or more outputs, or a combination thereof in a selected format; or perform a hybrid-fused FMMA operation by enabling a very low precision format (VLP) operand to use a plurality of formats. 14. The system of claim 8 , wherein the FMMA unit implements both a half-precision fused multiple add (FMA) operation and a very low precision format (VLP) FMMA operation, wherein the VLP is a format using less than sixteen bits comprising a sign bit, exponent bits (e), and mantissa bits (m), and the FMMA unit is selectively configured to perform the FMA operation or the FMMA operation. 15. A computer program product for, by a processor, implementing a fused multiply-multiply-accumulate (FMMA) operation in a computing environment, the computer program product comprising a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising: an executable portion that receives, by the processor, an instruction stored in a memory, wherein the instruction contains at least two operands of mixed bit-precision formats; and an executable portion that executes the instruction, wherein, when executing the instruction, the one or more processors implement a FMMA unit to perform an internal rounding operation associated with floating point arithmetic of the instruc

Assignees

Inventors

Classifications

  • G06F7/5443Primary

    Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title

  • Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers {(G06F7/4806, G06F7/4824, G06F7/49, G06F7/491, G06F7/544 take precedence)} · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11455142B2 cover?
Embodiments for implementing a fused multiply-multiply-accumulate (“FMMA”) unit by one or more processors in a computing system. Mantissas for two products, an exponent difference of the two products serving as an alignment shift amount for a product of the two products having a smallest exponent, and an alignment shift amount for an addend relative to an alternative product of the two product …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F7/5443. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 27 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).