Process for dual mode floating point multiplier-accumulator with high precision mode for near zero accumulation results

US12197889B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12197889-B2
Application numberUS-202117352374-A
CountryUS
Kind codeB2
Filing dateJun 21, 2021
Priority dateJun 21, 2021
Publication dateJan 14, 2025
Grant dateJan 14, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A process for a floating point multiplier-accumulator (MAC) is operative on N pairs of floating point values using N MAC processes operating concurrently, each MAC process operating on a pair of values comprising an input value and a coefficient value. Each MAC process simultaneously generates: an integer form fraction at a first bitwidth and a second bitwidth greater than the first bitwidth, a sign bit, and an exponent difference computed by subtracting an exponent sum from a maximum exponent sum of all exponent sums. The integer form fractions of the first bitwidths are provided to an adder tree using the first bitwidth, and if the sum has an excess percentage of leading 0s, then the second bitwidth is used by an adder tree using the second bitwidth to form a great precision integer form fraction. The sign, integer form fraction, and maximum exponent are provided to an normalizer which generates a floating point result.

First claim

Opening claim text (preview).

I claim: 1. A process for a floating point multiplier-accumulator (MAC) multiplying and accumulating N operands, each operand comprising an input value and a coefficient value, the process operative on a MAC controller comprising register-pipelined stages, the process comprising: a plurality N of MAC processes operating in parallel on a first register-pipeline stage, each MAC process receiving a unique one of the N operands comprising an associated input value and an associated coefficient value, each MAC process of the first register-pipeline stage comprising: a sign process operative on exclusive OR hardware and performing an exclusive OR operation on a sign bit of the associated input value and a sign bit of the associated coefficient value, the sign process outputting a corresponding sign bit; a mantissa process operative on a hardware multiplier and performing an integer multiplication of a mantissa of the associated input value and a mantissa of the associated coefficient value and outputting a fraction, the mantissa process asserting an exponent increment and dividing the fraction by two if an overflow condition occurs; an exponent process operative on summing and subtracting hardware and determining an exponent sum of an exponent of the associated input value and an exponent of the associated coefficient value, the exponent process receiving a maximum exponent sum value from a centralized find maximum exponent process, the exponent process incrementing the maximum exponent sum value when the exponent increment is asserted and an exponent difference is zero, the exponent process also outputting the exponent difference between the maximum exponent sum value and the exponent sum; a Pad, Complement, Shift (PCS) process operative in a second register-pipeline stage and receiving the fraction from the first pipeline stage mantissa process, the corresponding sign bit from the sign process, and the exponent difference, the PCS process returning a 2s complement when the sign bit is negative, padding the fraction by pre-pending and appending 0s to the fraction to generate a first value, and right shifting by the exponent difference and outputting a result as a PCS first output value having a first bitwidth, and also outputting the result as a PCS second output value having a second bitwidth greater than the first bitwidth; the centralized find maximum exponent process operative on comparator hardware and receiving an exponent sum from each exponent process of the MAC processes, identifying a maximum exponent sum and outputting the maximum exponent sum; storing the N PCS second output values in a pipeline register of the second register-pipeline stage; summing N PCS output first values using first summing hardware, and using the first bitwidth to output a first sum; summing N PCS output second values using second summing hardware, and using the second bitwidth when the first sum has more than a threshold percentage of leading 0s and returning a second sum; normalizing hardware outputting a floating point value by normalizing the second sum to generate a sign bit, a mantissa, and a number N of left shift bit positions to remove the leading 0s from the second sum to generate an output sum, a final stage process operative on the normalizing hardware thereafter forming a final output by concatenating the sign bit, the mantissa, and an exponent derived from the maximum exponent. 2. The process of claim 1 where the exponent of the floating point output value derived from the maximum exponent is derived by subtracting N from the maximum exponent and also subtracting an exponent correction. 3. The process of claim 2 where the exponent derived from the maximum exponent is an 8 bit value and the exponent correction is 127 and performed in either each MAC process exponent process, or the final stage. 4. The process of claim 1 where normalizing the second sum comprises: when the most significant bit (MSB) of the output sum is set, replacing the output sum with a 2s complement of the output sum, thereafter replacing the output sum with a left shifted output sum a number N of bit positions until no leading 0 bits remain and setting the sign bit. 5. The process of claim 2 where the exponent's precision is 8 bits and the exponent correction comprises subtracting 127. 6. The process of claim 1 where a stall condition is asserted when summing the N PCS output first values generates a sum with a number of leading 0 bits of a first final stage process mantissa which exceeds a threshold, the stall condition resulting in summing the N PCS output second values using the second bitwidth to be performed after the stall condition. 7. The process of claim 1 where the summing N PCS output second values using the second bitwidth is performed when the summing N PCS output first values using the first bitwidth has more than 50% or 75% leading 0s of the first bitwidth. 8. The process of claim 1 where the PCS process is operative with a bitwidth determined from a MAC process exponent difference. 9. The process of claim 1 where the exponent difference of each MAC process is incremented when the mantissa process does not overflow and an associated exponent difference is not 0. 10. The process of claim 1 where the exponent difference of each MAC process is decremented when the mantissa process has an overflow and an associated exponent difference is not 0. 11. The process of claim 1 where the maximum exponent is incremented when the exponent difference is 0 and an associated mantissa process has an overflow. 12. The process of claim 1 where the mantissa's precision is 4 bits when the exponent difference is greater than 24. 13. The process of claim 1 where the mantissa's precision is 8 bits when the exponent difference is greater than 21. 14. The process of claim 1 where the mantissa's precision is 12 bits when the exponent difference is larger than 12. 15. A process for a floating point multiplier-accumulator (MAC) comprising a first pipeline stage process operating on a first pipeline stage and a second pipeline stage process operating on a second pipeline stage, the process simultaneously multiplying and accumulating N operands in parallel operations, each operand comprising an input value and a coefficient value, the first pipeline stage process including a process operating on a MAC controller, the first pipeline stage process comprising: a plurality N of MAC processes, each MAC process receiving an associated input value and associated coefficient value, each MAC process operating simultaneously and in parallel with other MAC processes of the first pipeline stage, the first pipeline stage comprising: a sign process performing an exclusive OR operation with exclusive OR gates on a sign bit of the associated input value and a sign bit of the associated coefficient value resulting in a sign bit output; a mantissa process configured to use an integer multiplier to perform an integer multiplication of a hidden bit restored mantissa of the associated input value and a hidden bit restored mantissa of the associated coefficient value and outputting a resulting fraction, and upon an overflow condition of the resulting fraction, the mantissa process dividing the resulting fraction by two and asserting an exponent increment; an exponent process configured to use hardware adders and generating an exponent sum of an exponent of the associated input value and an exponent of the associated coefficient value, the exponent process receiving a maximum exponent from a centralized find maximum exponent sum process, the exponent process modifying the maximum e

Assignees

Inventors

Classifications

  • Multiplying · CPC title

  • Pipelining · CPC title

  • G06F7/5443Primary

    Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12197889B2 cover?
A process for a floating point multiplier-accumulator (MAC) is operative on N pairs of floating point values using N MAC processes operating concurrently, each MAC process operating on a pair of values comprising an input value and a coefficient value. Each MAC process simultaneously generates: an integer form fraction at a first bitwidth and a second bitwidth greater than the first bitwidth, a…
Who is the assignee on this patent?
Ceremorphic Inc
What technology area does this patent fall under?
Primary CPC classification G06F7/5443. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 14 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).