Compressing like-magnitude partial products in multiply accumulation
US-2021182026-A1 · Jun 17, 2021 · US
US12079593B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12079593-B2 |
| Application number | US-202117352373-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 21, 2021 |
| Priority date | Jun 21, 2021 |
| Publication date | Sep 3, 2024 |
| Grant date | Sep 3, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A floating point multiplier-accumulator (MAC) multiplies and accumulates N pairs of floating point values using N MAC processors operating simultaneously, each pair of values comprising an input value and a coefficient value to be multiplied and accumulated. The pairs of floating point values are simultaneously processed by the plurality of MAC processors, each of which output a signed integer form fraction with a first bitwidth and a second bitwidth, along with a maximum exponent. The first bitwidth signed integer form fractions are summed by an adder tree using the first bitwidth to form a first sum, and when an excess leading 0 condition is detected, a second adder tree operative on the second bitwidth integer form fractions forms a second sum. The first sum or second sum, along with the maximum exponent, is converted into floating point result.
Opening claim text (preview).
I claim: 1. A floating point multiplier-accumulator (MAC) multiplying and accumulating N operands, each operand comprising an input value and a coefficient value, the floating point MAC comprising: a plurality N of MAC processors, each MAC processor receiving a unique one of the N operands comprising an associated input value and an associated coefficient value, each MAC processor comprising: a sign processor configured to perform an exclusive OR operation on a sign bit of the associated input value and a sign bit of the associated coefficient value, the sign processor outputting a corresponding sign bit; a mantissa processor configured to perform an integer multiplication of a mantissa of the associated input value and a mantissa of the associated coefficient value and output a fraction, the mantissa processor asserting an exponent increment and dividing the fraction by two if an overflow condition occurs; an exponent processor determining an exponent sum of an exponent of the associated input value and an exponent of the associated coefficient value, the exponent processor receiving a maximum exponent sum value from a centralized find maximum exponent processor, the exponent processor incrementing the maximum exponent sum value if the exponent increment is asserted and an exponent difference is zero, the exponent processor also outputting the exponent difference between the maximum exponent sum value and the exponent sum; a Pad, Complement, Shift (PCS) Processor receiving the fraction from the mantissa processor, the corresponding sign bit from the sign processor, and the exponent difference, the PCS processor configured to take a 2s complement if the corresponding sign bit is negative, pad the fraction by pre-pending and appending 0s to the fraction to generate a first value, and right shifting by the exponent difference and outputting a result as a PCS first output value having a first bitwidth, and also outputting the result as a PCS second output value having a second bitwidth greater than the first bitwidth; the centralized find maximum exponent processor receiving an exponent sum from each exponent processor of the MAC processors, identifying a maximum exponent sum and outputting the maximum exponent sum; a pipeline register storing the N PCS second output values; a first adder tree having the first bitwidth and summing N PCS output first values and configured to output a first sum; a second adder tree having the second bitwidth and summing N PCS output second values if the first adder output has more than a threshold percentage of leading 0s and configured to output a second sum; a final stage configured to output a floating point value by normalizing the second sum to generate a sign bit, a mantissa, and a number M of left shift bit positions to remove the leading 0s, the final stage thereafter concatenating the sign bit, the mantissa, and an exponent derived from the maximum exponent. 2. The floating point MAC of claim 1 where the exponent derived from the maximum exponent is the maximum exponent minus an exponent correction caused by adding exponent sums as unsigned values. 3. The floating point MAC of claim 2 where the exponent derived from the maximum exponent is an 8 bit value and the exponent correction is 127 and performed in either each MAC processor exponent processor, or in the final stage. 4. The floating point MAC of claim 1 where normalizing the first sum or normalizing the second sum comprises: if the most significant bit (MSB) of the first sum or the second sum is set, replacing the respective first sum or second sum with a 2s complement of the respective first sum or second sum, thereafter replacing the respective first sum or second sum with a left shifted respective first sum or second sum shifted by the number M of left shift bit positions until no leading 0 bits remain and setting the sign bit. 5. The floating point MAC of claim 2 where the exponent has a precision of 8 bits and the exponent correction comprises subtracting 127. 6. The floating point MAC of claim 1 where a stall condition is asserted when the first sum has a number of leading 0 bits of a first final stage mantissa which exceeds a threshold, the stall condition causing the second adder tree to be enabled after the stall condition. 7. The floating point MAC of claim 1 where the second adder tree is enabled if the first sum has more than 50% or 75% leading 0s of a bitwidth of the first sum. 8. The floating point MAC of claim 1 where the PCS processor is operative with a bitwidth determined from a MAC processor exponent difference. 9. The floating point MAC of claim 1 where the exponent difference of each MAC processor is incremented if the mantissa processor does not overflow and the exponent difference is not 0. 10. The floating point MAC of claim 1 where the exponent difference of each MAC processor is decremented if the mantissa processor has an overflow and the exponent difference is not 0. 11. The floating point MAC of claim 1 where the maximum exponent is incremented if the exponent difference is 0 and the mantissa processor has an overflow. 12. The floating point MAC of claim 1 where the mantissa's precision is 4 bits when the exponent difference is greater than 24. 13. The floating point MAC of claim 1 where the mantissa's precision is 8 bits when the exponent difference is greater than 21. 14. The floating point MAC of claim 1 where the mantissa's precision is 12 bits when the exponent difference is larger than 12. 15. A floating point multiplier-accumulator (MAC) multiplying and accumulating N operands, each operand comprising an associated input value and an associated coefficient value, the floating point MAC comprising: a plurality N of MAC processors, each MAC processor receiving the associated input value and the associated coefficient value, each MAC processor comprising: a sign processor configured to perform an exclusive OR operation on a sign bit of the associated input value and a sign bit of the associated coefficient value resulting in a sign bit output; a mantissa processor configured to perform an integer multiplication of a hidden bit restored mantissa of the associated input value and a hidden bit restored mantissa of the associated coefficient value and outputting a resulting fraction, and upon an overflow condition of the resulting fraction, the mantissa processor dividing the resulting fraction by two and asserting an exponent increment; an exponent processor generating an exponent sum of an exponent of the associated input value and an exponent of the associated coefficient value, the exponent processor receiving a maximum exponent from a centralized find maximum exponent processor, the exponent processor modifying the maximum exponent and also outputting an exponent difference computed by subtracting the exponent sum from the maximum exponent, the exponent processor also using the exponent difference and the sign bit output to estimate a precision bitwidth; a Pad, Complement, Shift (PCS) Processor receiving the resulting fraction from the mantissa processor and also the sign bit output from the sign processor, the PCS processor configured to perform operations with the precision bitwidth and pad the resulting fraction by pre-pending and appending 0s to the resulting fraction to generate a first value, thereafter generating a second value by performing a two's complement of the first value if the sign bit output is negative and otherwise taking no action on the first value, the PCS processor configured to performing a shift operation on the second value by right shifting the second
Adding; Subtracting {(G06F7/4833, G06F7/4836 take precedence)} · CPC title
in floating-point computations · CPC title
Multiplying · CPC title
Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title
EXCLUSIVE-OR circuits, i.e. giving output if input signal exists at only one input; COINCIDENCE circuits, i.e. giving output only if all input signals are identical · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.