Techniques for fast dot-product computation
US-2023053261-A1 · Feb 16, 2023 · US
US11893360B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11893360-B2 |
| Application number | US-202117180856-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 21, 2021 |
| Priority date | Feb 21, 2021 |
| Publication date | Feb 6, 2024 |
| Grant date | Feb 6, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A process for performing vector dot products receives a row vector and a column vector as floating point numbers in a format of sign plus exponent bits plus mantissa bits. The process generates a single dot product value by separately processing the sign bits, exponent bits, and mantissa bits to form a sign bit, a normalized mantissa formed by multiplying pairs multiplicand elements, and exponent information including MAX_EXP and EXP_DIFF. A second pipeline stage receives the multiplied pairs of normalized mantissas, optionally performs an exponent adjustment, pads, complements and shifts the normalized mantissas, and the results are added in a series of stages until a single addition result remains, which is normalized using MAX_EXP to form the floating point output result.
Opening claim text (preview).
I claim: 1. A process for a pipelined floating point multiplier-accumulator (MAC) computing a floating point result from N floating point pairs, each floating point pair comprising a floating point input and a floating point coefficient, the process comprising: a maximum exponent controller determining a maximum exponent (MAX_EXP) value from among the N floating point pairs, the MAX_EXP value comprising a maximum sum value of an exponent of a floating point input and a corresponding exponent of a floating point coefficient; a dot product controller for each of the associated N floating point pairs performing a process of: generating a sign bit output from an exclusive OR (XOR) operation performed with an XOR gate on a sign bit of a corresponding floating point input and a sign bit of a corresponding floating point coefficient; generating a normalized mantissa by the dot product controller performing a multiplication of a mantissa of a corresponding floating point input with a mantissa of a corresponding floating point coefficient, and asserting an exponent increment (EXP_INC) bit when the multiplication generates a result with a most significant bit of 1; generating an exponent difference (EXP_DIFF) value by subtracting a sum of an exponent of the corresponding floating point input and an exponent of the corresponding floating point coefficient from MAX_EXP; when the EXP_DIFF value is 0 and the EXP_INC bit is asserted, incrementing MAX_EXP and asserting a maximum increment (MAX_INC) bit to other dot product controllers generating a respective normalized mantissa; when the EXP_DIFF value is not 0 and the EXP_INC bit is not asserted and the MAX_INC bit is asserted, incrementing the EXP_DIFF value; when the EXP_DIFF value is not 0 and the EXP_INC bit is asserted and the MAX INC bit is not asserted, decrementing the EXP_DIFF value; performing a padding step of pre-pended the normalized mantissa with at least one 0, performing a complement step of replacing a result of the padding step with a 2's complement of the result of the padding step if a corresponding sign bit output is 1, performing a shift step using a shift register which shifts the result of the complement step to the right by a number EXP_DIFF of bit positions and outputting a shifted result as an integer form fraction; accumulating N integer form fractions to generate an adder output; generating a floating point output by converting the adder output to a resulting sign bit and a resulting normalized mantissa derived from the adder output, thereafter outputting the resulting sign bit, an exponent derived from MAX_EXP, and the resulting normalized mantissa. 2. The process of claim 1 where at least one of the floating point inputs, the floating point coefficients, and the floating point output conforms to at least one of the formats: bfloat, FP16, and FP32 of an IEEE standard 754. 3. The process of claim 1 where the process is operative on multiple pipeline stages. 4. The process of claim 1 where the process is operative on a first pipeline stage and a second pipeline stage which has registered inputs receiving values from the first pipeline stage. 5. The process of claim 4 where the first pipeline stage and second pipeline stage operate in parallel on the N pairs simultaneously. 6. The process of claim 4 where the first pipeline stage and second pipeline stage operate sequentially on each of the N pairs. 7. A process for a pipelined floating point multiplier-accumulator (MAC) implemented with registers separating stages and configured to generate a sum of N products from a corresponding pair of values, each pair of values comprising a floating point input value and a corresponding floating point coefficient value, the process comprising: each of the N products provided by a dot product controller performing a respective dot product process of: outputting a sign bit computed from an exclusive OR (XOR) of a sign bit of a floating point input value with a sign bit of a corresponding floating point coefficient value; computing an exponent sum of an exponent of the floating point input value with an exponent of the corresponding floating point coefficient, determining a maximum exponent (MAX_EXP) value from all N exponent sums, determining an exponent difference (EXP_DIFF) value between the MAX_EXP value and the corresponding exponent sum; generating a product by multiplying a mantissa from the floating point input value with a mantissa from the corresponding floating point coefficient value and normalizing the product, and asserting an exponent increment (EXP_INC) bit if an overflow results from the multiplication; when the EXP_DIFF value is 0 and the EXP_INC bit is asserted, incrementing MAX_EXP and asserting a maximum increment (MAX_INC) bit; when the EXP_DIFF value is not 0 and the EXP_INC bit is not asserted and the MAX_INC bit is asserted, incrementing the EXP_DIFF value; when the EXP_DIFF value is not 0 and the EXP_INC bit is asserted and the MAX_INC bit is not asserted, decrementing the EXP_DIFF value; prepending and appending 0 values to the normalized product to generate a padded value, modifying the padded value by taking a two's complement of the padded value if the sign bit is 1, thereafter right shifting the result by the EXP_DIFF value to generate an integer form fraction; a summing process receiving each of the N integer form fractions and outputting an integer form fraction sum; converting the integer form fraction sum to a resulting sign bit and resulting normalized mantissa, and outputting a floating point output value comprising the resulting sign bit, an exponent derived from said MAX_EXP value, and the resulting normalized mantissa. 8. The process of claim 7 where a first pipeline stage outputs a sign bit, a normalized mantissa, the EXP_INC bit, MAX_EXP, and the EXP_DIFF value to a second pipeline stage having a registered input. 9. The process of claim 8 where the second pipeline stage generates N integer form fractions. 10. The process of claim 7 where each floating point input value comprises, in sequence, a sign bit, 8 bits of exponent, and 7 bits of mantissa. 11. The process of claim 7 where outputting an integer form fraction comprises N/2 add operations in parallel in a first stage, the N/2 add operations having a bit width equal to a number of bits of the integer form fraction sum. 12. The process of claim 7 where prepending 0 values comprises prepending at least log 2 N 0s. 13. The process of claim 7 , where if more than one said EXP_DIFF value equals 0, the MAX_INC bit is asserted and MAX EXP is only incremented once. 14. The process of claim 7 where the MAX_INC bit from a process with the EXP_DIFF value equal to 0 is provided to processes where the EXP_DIFF value is not equal to 0. 15. The process of claim 7 where the integer form fraction has a bit width in a range of 16 to 32 bits. 16. The process of claim 7 where the floating point input values have a format conforming to IEEE standard 754. 17. A process for a pipelined floating point multiplier-accumulator (MAC) receiving N pairs of floating point values, each pair comprising a floating point input and a floating point coefficient, each floating point input and floating point coefficient comprising a sign bit, a plurality of mantissa bits, and a plurality of exponent bits, the process configured to operate on at least one pipeline stage register and comprising: identifying for each of the N pairs of floating point values a maximum exponent sum (MAX_EXP) value among N sums of input exponent and coefficient exp
Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title
Multiplying · CPC title
Normalisation mentioned as feature only · CPC title
controlled in tandem, e.g. multiplier-accumulator · CPC title
Denomination or exception handling, e.g. rounding or overflow · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.