Non-atomic split-path fused multiply-accumulate

US9778907B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9778907-B2
Application numberUS-201514748817-A
CountryUS
Kind codeB2
Filing dateJun 24, 2015
Priority dateJul 2, 2014
Publication dateOct 3, 2017
Grant dateOct 3, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A microprocessor performs a fused multiply-accumulate operation of a form ±A*B±C using first and second execution units. An input operand analyzer circuit determines whether values of A, B and/or C meet a sufficient condition to perform a joint accumulation of C with partial products of A and B. The first instruction execution unit multiplies A and B and jointly accumulates C to partial products of A and B when the values of A, B and/or C meet a sufficient condition to perform a joint accumulation of C with the partial products of A and B. The second instruction execution unit separately accumulates C to the products of A and B when the values of A, B and/or C do not meet a sufficient condition to perform a joint accumulation of C with the partial products of A and B.

First claim

Opening claim text (preview).

The invention claimed is: 1. A microprocessor operable to perform a fused multiply-accumulate operation of a form ±A*B±C, wherein A, B, and C are input operands, the microprocessor comprising: an input operand analyzer circuit that determines whether values of A, B and/or C meet a sufficient condition to perform a joint accumulation of C with partial products of A and B; a first instruction execution unit that multiplies A and B and jointly accumulates C to partial products of A and B when the values of A, B and/or C meet a sufficient condition to perform a joint accumulation of C with the partial products of A and B; and a second instruction execution unit that separately accumulates C to the products of A and B when the values of A, B and/or C do not meet a sufficient condition to perform a joint accumulation of C with the partial products of A and B; wherein each of the first and second instruction execution units is an atomic circuit comprising a decoder that independently decodes and operates on instructions. 2. The microprocessor of claim 1 , wherein the first instruction execution unit is a multiplier operable to execute multiplication instructions and perform at least a first part of the fused multiply-accumulate operation. 3. The microprocessor of claim 1 , wherein the second instruction execution unit is an adder operable to execute addition and subtraction instructions and perform at least a second part of the fused multiply-accumulate operation. 4. The microprocessor of claim 1 , wherein a sufficient condition for joint accumulation with C is that an absolute magnitude of C is close enough to an absolute magnitude of the products of A and B to produce a potential for mass cancellation, wherein mass cancellation refers to a cancellation of one or more of the most significant bits of the product of A and B when summed with C. 5. The microprocessor of claim 1 , wherein A, B and C are represented with exponents, and a sufficient condition for joint accumulation with C is that a sum of the exponents of A and B minus an exponent of C, further adjusted by any exponent bias value, is greater than or equal to negative one. 6. The microprocessor of claim 1 , wherein: A, B and C are floating point operands, each comprising a sign indicator, mantissa and exponent; a first sufficient condition is that a sum of the exponents of A and B minus an exponent of C is greater than or equal to negative one; and a second sufficient condition is that an absolute magnitude of C is close enough to an absolute magnitude of the products of A and B to produce a potential for mass cancellation, wherein mass cancellation refers to a cancellation of one or more of the most significant bits of the product of A and B when summed with C; wherein each of the first and second instruction execution units are atomic units that independently decode and operate on instructions. 7. A microprocessor operable to perform a fused multiply-accumulate operation of a form ±A*B±C, wherein A, B, and C are input operands, the microprocessor comprising: an input operand analyzer circuit that determines whether values of A, B and/or C meet a sufficient condition to perform a joint accumulation of C with partial products of A and B; a first instruction execution unit that multiplies A and B and jointly accumulates C to partial products of A and B when the values of A, B and/or C meet a sufficient condition to perform a joint accumulation of C with the partial products of A and B; and a second instruction execution unit that separately accumulates C to the products of A and B when the values of A, B and/or C do not meet a sufficient condition to perform a joint accumulation of C with the partial products of A and B; wherein A, B and C are represented with mantissas, wherein the first instruction execution unit includes a summation data path less than 3 m bits with an additional m-bit sticky collector, m representing a number of bits used to represent the mantissas of A and B. 8. The microprocessor of claim 7 , wherein a sufficient condition for the joint accumulation is that C have a magnitude, relative to a magnitude of the product of A and B, that enables C to be aligned in the summation tree without shifting the most significant bit of C to the left of most significant bit provided within the summation tree for the partial product summation of A and B. 9. A microprocessor operable to perform a fused multiply-accumulate operation of a form ±A*B±C, wherein A, B, and C are input operands, the microprocessor comprising: an input operand analyzer circuit that determines whether values of A, B and/or C meet a sufficient condition to perform a joint accumulation of C with partial products of A and B; a first instruction execution unit that multiplies A and B and jointly accumulates C to partial products of A and B when the values of A, B and/or C meet a sufficient condition to perform a joint accumulation of C with the partial products of A and B; and a second instruction execution unit that separately accumulates C to the products of A and B when the values of A, B and/or C do not meet a sufficient condition to perform a joint accumulation of C with the partial products of A and B; wherein: A, B and C are floating point operands, each comprising a sign indicator, mantissa and exponent; a first partial condition for a joint accumulation with C is that a sum of the exponents of A and B minus an exponent of C is greater than or equal to negative two; a second partial condition for a joint accumulation with C is that an accumulation of C to a product of A and B would result in an effective subtraction; and the first and second partial conditions together compose a sufficient condition for joint accumulation with C. 10. The microprocessor of claim 9 , wherein an effective subtraction results if |R| is less than a greater of |A*B| or |C|. 11. A method in a microprocessor of performing a multiply-accumulate operation of a form ±A*B±C, wherein A, B, and C are input values, the method comprising: determining whether values of A, B and/or C satisfy a sufficient condition for joint accumulation of C with partial products of A and B; within a first instruction execution unit, multiplying A and B and selectively accumulating C to a partial products of A and B if the values of A, B and/or C satisfy a sufficient condition for joint accumulation; and within a second instruction execution unit, selectively accumulating C to the product of A and B if the values of A, B and/or C do not satisfy a sufficient condition for joint accumulation; wherein each of the first and second instruction execution units is an atomic circuit comprising a decoder that independently decodes and operates on instructions. 12. The method of claim 11 , wherein the first instruction execution unit is a multiplier operable to execute multiplication instructions and perform at least a first part of the fused multiply-accumulate operation. 13. The method of claim 11 , wherein the second instruction execution unit is an adder operable to execute addition and subtraction instructions and perform at least a second part of the fused multiply-accumulate operation. 14. A microprocessor operable to perform a multiply-accumulate operation of a form ±A*B±C, wherein A, B, and C are input operands, the apparatus comprising: a first instruction execution unit configured to perform a multiply operation to compute a product of A and B, and further configured to selectively perform an accumulation operation that accumulates C to the product of A and B; a second instruction execution unit configured to accumulate C to the p

Assignees

Inventors

Classifications

  • Multiplying · CPC title

  • Implementation of IEEE-754 Standard · CPC title

  • Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title

  • Arrangements for executing machine instructions, e.g. instruction decode (for executing microinstructions G06F9/22) · CPC title

  • controlled in tandem, e.g. multiplier-accumulator · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9778907B2 cover?
A microprocessor performs a fused multiply-accumulate operation of a form ±A*B±C using first and second execution units. An input operand analyzer circuit determines whether values of A, B and/or C meet a sufficient condition to perform a joint accumulation of C with partial products of A and B. The first instruction execution unit multiplies A and B and jointly accumulates C to partial product…
Who is the assignee on this patent?
Via Alliance Semiconductor Co Ltd, Via Alliance Semiconductor Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F7/483. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 03 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).