Power saving floating point Multiplier-Accumulator with a high precision accumulation detection mode

US12079593B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12079593-B2
Application numberUS-202117352373-A
CountryUS
Kind codeB2
Filing dateJun 21, 2021
Priority dateJun 21, 2021
Publication dateSep 3, 2024
Grant dateSep 3, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A floating point multiplier-accumulator (MAC) multiplies and accumulates N pairs of floating point values using N MAC processors operating simultaneously, each pair of values comprising an input value and a coefficient value to be multiplied and accumulated. The pairs of floating point values are simultaneously processed by the plurality of MAC processors, each of which output a signed integer form fraction with a first bitwidth and a second bitwidth, along with a maximum exponent. The first bitwidth signed integer form fractions are summed by an adder tree using the first bitwidth to form a first sum, and when an excess leading 0 condition is detected, a second adder tree operative on the second bitwidth integer form fractions forms a second sum. The first sum or second sum, along with the maximum exponent, is converted into floating point result.

First claim

Opening claim text (preview).

I claim: 1. A floating point multiplier-accumulator (MAC) multiplying and accumulating N operands, each operand comprising an input value and a coefficient value, the floating point MAC comprising: a plurality N of MAC processors, each MAC processor receiving a unique one of the N operands comprising an associated input value and an associated coefficient value, each MAC processor comprising: a sign processor configured to perform an exclusive OR operation on a sign bit of the associated input value and a sign bit of the associated coefficient value, the sign processor outputting a corresponding sign bit; a mantissa processor configured to perform an integer multiplication of a mantissa of the associated input value and a mantissa of the associated coefficient value and output a fraction, the mantissa processor asserting an exponent increment and dividing the fraction by two if an overflow condition occurs; an exponent processor determining an exponent sum of an exponent of the associated input value and an exponent of the associated coefficient value, the exponent processor receiving a maximum exponent sum value from a centralized find maximum exponent processor, the exponent processor incrementing the maximum exponent sum value if the exponent increment is asserted and an exponent difference is zero, the exponent processor also outputting the exponent difference between the maximum exponent sum value and the exponent sum; a Pad, Complement, Shift (PCS) Processor receiving the fraction from the mantissa processor, the corresponding sign bit from the sign processor, and the exponent difference, the PCS processor configured to take a 2s complement if the corresponding sign bit is negative, pad the fraction by pre-pending and appending 0s to the fraction to generate a first value, and right shifting by the exponent difference and outputting a result as a PCS first output value having a first bitwidth, and also outputting the result as a PCS second output value having a second bitwidth greater than the first bitwidth; the centralized find maximum exponent processor receiving an exponent sum from each exponent processor of the MAC processors, identifying a maximum exponent sum and outputting the maximum exponent sum; a pipeline register storing the N PCS second output values; a first adder tree having the first bitwidth and summing N PCS output first values and configured to output a first sum; a second adder tree having the second bitwidth and summing N PCS output second values if the first adder output has more than a threshold percentage of leading 0s and configured to output a second sum; a final stage configured to output a floating point value by normalizing the second sum to generate a sign bit, a mantissa, and a number M of left shift bit positions to remove the leading 0s, the final stage thereafter concatenating the sign bit, the mantissa, and an exponent derived from the maximum exponent. 2. The floating point MAC of claim 1 where the exponent derived from the maximum exponent is the maximum exponent minus an exponent correction caused by adding exponent sums as unsigned values. 3. The floating point MAC of claim 2 where the exponent derived from the maximum exponent is an 8 bit value and the exponent correction is 127 and performed in either each MAC processor exponent processor, or in the final stage. 4. The floating point MAC of claim 1 where normalizing the first sum or normalizing the second sum comprises: if the most significant bit (MSB) of the first sum or the second sum is set, replacing the respective first sum or second sum with a 2s complement of the respective first sum or second sum, thereafter replacing the respective first sum or second sum with a left shifted respective first sum or second sum shifted by the number M of left shift bit positions until no leading 0 bits remain and setting the sign bit. 5. The floating point MAC of claim 2 where the exponent has a precision of 8 bits and the exponent correction comprises subtracting 127. 6. The floating point MAC of claim 1 where a stall condition is asserted when the first sum has a number of leading 0 bits of a first final stage mantissa which exceeds a threshold, the stall condition causing the second adder tree to be enabled after the stall condition. 7. The floating point MAC of claim 1 where the second adder tree is enabled if the first sum has more than 50% or 75% leading 0s of a bitwidth of the first sum. 8. The floating point MAC of claim 1 where the PCS processor is operative with a bitwidth determined from a MAC processor exponent difference. 9. The floating point MAC of claim 1 where the exponent difference of each MAC processor is incremented if the mantissa processor does not overflow and the exponent difference is not 0. 10. The floating point MAC of claim 1 where the exponent difference of each MAC processor is decremented if the mantissa processor has an overflow and the exponent difference is not 0. 11. The floating point MAC of claim 1 where the maximum exponent is incremented if the exponent difference is 0 and the mantissa processor has an overflow. 12. The floating point MAC of claim 1 where the mantissa's precision is 4 bits when the exponent difference is greater than 24. 13. The floating point MAC of claim 1 where the mantissa's precision is 8 bits when the exponent difference is greater than 21. 14. The floating point MAC of claim 1 where the mantissa's precision is 12 bits when the exponent difference is larger than 12. 15. A floating point multiplier-accumulator (MAC) multiplying and accumulating N operands, each operand comprising an associated input value and an associated coefficient value, the floating point MAC comprising: a plurality N of MAC processors, each MAC processor receiving the associated input value and the associated coefficient value, each MAC processor comprising: a sign processor configured to perform an exclusive OR operation on a sign bit of the associated input value and a sign bit of the associated coefficient value resulting in a sign bit output; a mantissa processor configured to perform an integer multiplication of a hidden bit restored mantissa of the associated input value and a hidden bit restored mantissa of the associated coefficient value and outputting a resulting fraction, and upon an overflow condition of the resulting fraction, the mantissa processor dividing the resulting fraction by two and asserting an exponent increment; an exponent processor generating an exponent sum of an exponent of the associated input value and an exponent of the associated coefficient value, the exponent processor receiving a maximum exponent from a centralized find maximum exponent processor, the exponent processor modifying the maximum exponent and also outputting an exponent difference computed by subtracting the exponent sum from the maximum exponent, the exponent processor also using the exponent difference and the sign bit output to estimate a precision bitwidth; a Pad, Complement, Shift (PCS) Processor receiving the resulting fraction from the mantissa processor and also the sign bit output from the sign processor, the PCS processor configured to perform operations with the precision bitwidth and pad the resulting fraction by pre-pending and appending 0s to the resulting fraction to generate a first value, thereafter generating a second value by performing a two's complement of the first value if the sign bit output is negative and otherwise taking no action on the first value, the PCS processor configured to performing a shift operation on the second value by right shifting the second

Assignees

Inventors

Classifications

  • Adding; Subtracting {(G06F7/4833, G06F7/4836 take precedence)} · CPC title

  • in floating-point computations · CPC title

  • Multiplying · CPC title

  • G06F7/5443Primary

    Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title

  • H03K19/21Primary

    EXCLUSIVE-OR circuits, i.e. giving output if input signal exists at only one input; COINCIDENCE circuits, i.e. giving output only if all input signals are identical · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12079593B2 cover?
A floating point multiplier-accumulator (MAC) multiplies and accumulates N pairs of floating point values using N MAC processors operating simultaneously, each pair of values comprising an input value and a coefficient value to be multiplied and accumulated. The pairs of floating point values are simultaneously processed by the plurality of MAC processors, each of which output a signed integer …
Who is the assignee on this patent?
Redpine Signals Inc, Ceremorphic Inc
What technology area does this patent fall under?
Primary CPC classification G06F7/5443. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 03 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).