Apparatuses and methods to accelerate matrix multiplication
US-2021263993-A1 · Aug 26, 2021 · US
US11983237B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11983237-B2 |
| Application number | US-202117180831-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 21, 2021 |
| Priority date | Feb 21, 2021 |
| Publication date | May 14, 2024 |
| Grant date | May 14, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A vector dot product multiplier receives a row vector and a column vector as floating point numbers in a format of sign plus exponent bits plus mantissa bits. The dot product multiplier generates a single dot product value by separately processing the sign bits, exponent bits, and mantissa bits in a few pipelined stages. A first pipeline stage generates a sign bit, a normalized mantissa formed by multiplying pairs multiplicand elements, and exponent information. A second pipeline stage receives the multiplied pairs of normalized mantissas, performs an adjustment, performs a padding, complement, and shift, and sums the results in an adder stage. The resulting integer is normalized to generate a sign bit, exponent, and mantissa of the floating point result.
Opening claim text (preview).
I claim: 1. A floating point multiplier-accumulator (MAC) configured to generate a sum of N products, each product comprising a floating point input value multiplied by a corresponding floating point coefficient value, the floating point MAC comprising: a plurality of N multipliers, each multiplier comprising: a sign processor, an exponent processor and a mantissa processor; the sign processor configured to output a sign bit computed from an exclusive OR (XOR) of a sign bit of a floating point input value with a sign bit of a corresponding floating point coefficient value; the exponent processor comprising an exponent summer configured to compute an exponent sum of an exponent of the floating point input value with an exponent of the corresponding floating point coefficient value, the exponent processor also determining a maximum exponent (MAX_EXP) value from all N exponent processor sums, the exponent processor also determining an exponent difference (EXP_DIFF) value between the MAX_EXP value and the exponent sum; the mantissa processor configured to generate a product by multiplying a mantissa from the floating point input value with a mantissa from the corresponding coefficient floating point value and normalizing the product, the mantissa processor asserting an exponent increment (EXP_INC) bit if an overflow results from the multiplication; an adjustment processor configured to determine if the EXP_DIFF value=0; when the EXP_DIFF value is 0, the adjustment processor incrementing the MAX_EXP value and asserting a maximum increment (MAX_INC) bit to other multipliers if the EXP_INC bit is asserted; when the EXP_DIFF value is not 0, the adjustment processor incrementing the EXP_DIFF value when the EXP_INC bit is not asserted and the MAX_INC bit is asserted; when the EXP_DIFF value is not 0, the adjustment processor decrementing the EXP_DIFF value when the EXP_INC bit is asserted and the MAX_INC bit is not asserted; a mantissa PCS stage configured to prepend and append 0 values to the normalized product to generate a padded value, the mantissa PCS stage modifying the padded value by taking a two's complement of the padded value if the sign processor output bit is 1, the mantissa PCS stage further comprising a shift register and thereafter right shifting the result by the EXP_DIFF value to generate an integer form fraction; a summer stage configured to receive a corresponding integer form fraction from each of the N mantissa processors and outputting an integer form fraction sum; a floating point normalizer configured to convert the integer form fraction sum to a resulting sign bit and a resulting normalized mantissa, and outputting a floating point output value comprising the resulting sign bit, an exponent derived from said MAX_EXP value, and the resulting normalized mantissa. 2. The floating point MAC of claim 1 where a first pipeline stage comprises the sign processors, the mantissa processors, and the exponent processors, the first pipeline stage outputting the sign bits, the normalized mantissas, the exponent adjustment bits, and the EXP_DIFF values, along with the MAX_EXP value to a second pipeline stage having a registered input. 3. The floating point MAC of claim 2 where the second pipeline stage generates N integer form fractions to the summer stage. 4. The floating point MAC of claim 1 where each floating point input value comprises, in sequence, 1 sign bit, 8 bits of exponent, and 7 bits of mantissa. 5. The floating point MAC of claim 1 where the summer stage comprises N/2 first adders operating in parallel in a first stage, the N/2 first adders having a bit width equal to a number of bits output by the mantissa PCS stage. 6. The floating point MAC of claim 1 where prepending 0 values comprises prepending at least loge N 0s. 7. The floating point MAC of claim 1 where if more than one multiplier has the EXP_DIFF value=0, the MAX_INC bit is asserted and the MAX_EXP value is only incremented once. 8. The floating point MAC of claim 1 where the MAX_INC bit from a multiplier which has the EXP_DIFF value=0 and the EXP_INC bit asserted is provided to the other multipliers where the EXP_DIFF value is not 0. 9. The floating point MAC of claim 1 where the Mantissa PCS stage is configured to output a bit width in a range of 16 to 32 bits. 10. The floating point MAC of claim 1 where the floating point input values have a format conforming to IEEE standard 754. 11. A floating point multiplier-accumulator (MAC) configured to receive N pairs of floating point values, each pair comprising a floating point input and a floating point coefficient, each floating point input and floating point coefficient comprising a sign bit, a plurality of mantissa bits, and a plurality of exponent bits, the floating point MAC comprising: a max exponent finder configured to identify a maximum exponent sum (MAX_EXP) value among N sums of floating point input exponent and coefficient exponent bits; a plurality N of first pipeline stages, each first pipeline stage comprising: a sign bit processor configured to output, for each pair of floating point values, an exclusive OR bit value of a sign bit of a respective floating point input value and a sign bit of a corresponding floating point coefficient value; a mantissa processor forming a respective integer multiplication product of mantissa bits of a respective floating point input value and mantissa bits of a corresponding floating point coefficient value, the integer multiplication product rounded and normalized to a fewer number of bits, the mantissa processor asserting an exponent increment EXP_INC bit if the most significant bit of the integer multiplication product is 1; an exponent processor generating a respective EXP_DIFF value as a difference between the MAX_EXP value and the sum of an exponent of a respective floating point input value and an exponent of a corresponding floating point coefficient value; a plurality N of second pipeline stages coupled to corresponding first pipeline stages, each second pipeline stage comprising: an adjustment stage receiving the respective EXP_DIFF value and the respective EXP_INC bit; when the respective EXP_DIFF value is 0 and the respective EXP_INC bit is asserted, the adjustment stage incrementing the MAX_EXP value and asserting a respective maximum increment (MAX_INC) bit to other adjustment stages; when the respective EXP_DIFF value is not 0: the adjustment stage incrementing the respective EXP_DIFF value when the respective EXP_INC bit is not asserted and the respective MAX_INC bit is asserted; the adjustment stage decrementing the respective EXP_DIFF value when the respective EXP_INC bit is asserted and the respective MAX_INC bit is not asserted; a mantissa PCS stage performing: a first step of padding the normalized output of the respective mantissa processor to a greater number of bits than a number of bits in the normalized output to form a respective padded value, a second step of replacing the respective padded value with a 2s complement of the respective padded value if the respective sign bit processor output is 1, and a third step of shifting a result of the second step by the respective EXP_DIFF value of bit positions to generate a respective integer form fraction; an adder stage computing a sum of N respective integer form fraction values, one integer form fraction value from each respective mantissa PCS stage of the plurality N of second pipeline stages; an output stage generating a floating point result by determining a sign of the adder stage sum to generate a sign part, normalizing the sum to generate a mantissa part, and generating the floating point result from the sign part,
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers {(G06F7/4806, G06F7/4824, G06F7/49, G06F7/491, G06F7/544 take precedence)} · CPC title
Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title
Denomination or exception handling, e.g. rounding or overflow · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.