Temporally split fused multiply-accumulate operation

US9778908B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9778908-B2
Application numberUS-201514748870-A
CountryUS
Kind codeB2
Filing dateJun 24, 2015
Priority dateJul 2, 2014
Publication dateOct 3, 2017
Grant dateOct 3, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A microprocessor splits a fused multiply-accumulate operation of the form A*B+C into first and second multiply-accumulate sub-operations to be performed by a multiplier and an adder. The first sub-operation at least multiplies A and B, and conditionally also accumulates C to the partial products of A and B to generate an unrounded nonredundant sum. The unrounded nonredundant sum is stored in memory shared by the multiplier and adder for an indefinite time period, enabling the multiplier and adder to perform other operations unrelated to the multiply-accumulate operation. The second sub-operation conditionally accumulates C to the unrounded nonredundant sum if C is not already incorporated into the value, and then generates a final rounded result.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method in a microprocessor for performing a fused multiply-accumulate operation of a form ±A*B±C, wherein A, B and C are input operands, and wherein no rounding occurs before C is accumulated to a product of A and B, the method comprising: splitting the fused multiply-accumulate operation into first and second multiply-accumulate sub-operations to be performed by one or more instruction execution units; in the first multiply-accumulate sub-operation, selectively either accumulating partial products of A and B with C, or accumulating only the partial products of A and B, and to generate therefrom an unrounded nonredundant sum; between the first and second multiply-accumulate sub-operations, storing the unrounded nonredundant sum in memory, enabling the one or more instruction execution units to perform other operations unrelated to the multiply-accumulate operation; wherein the memory is external to the one or more instruction execution units and comprises a result store for storing the unrounded nonredundant sum and a calculation control indicator store, distinct from the result store, that stores a plurality of calculation control indicators that indicate how subsequent calculations in the second multiply-accumulate sub-operation should proceed; in the second multiply-accumulate sub-operation, accumulating C with the unrounded nonredundant sum if the first multiply-accumulate sub-operation produced the unrounded nonredundant sum without accumulating C; and in the second multiply-accumulate sub-operation, generating a final rounded result of the fused multiply-accumulate operation. 2. The method of claim 1 , wherein the fused multiply-accumulate operation is performed by at least two instruction execution units. 3. The method of claim 1 , wherein the result store is coupled to a result bus, the result bus being common to the one or more instruction execution units. 4. The method of claim 1 , wherein the result store is a reorder buffer. 5. The method of claim 1 , wherein the calculation control indicator store is a cache that is not coupled to the result bus and that is shared only by execution units configured to perform the first or second multiply-accumulate sub-operation. 6. The method of claim 1 , wherein the one or more instruction execution units comprise a multiply-accumulate unit configured to perform the first multiply-accumulate sub-operation in response to a first multiply-accumulate instruction and to perform the second multiply-accumulate sub-operation in response to a second multiply-accumulate instruction. 7. A method in a microprocessor for performing a fused multiply-accumulate operation of a form ±A*B ±C, wherein A, B and C are input operands, and wherein no rounding occurs before C is accumulated to a product of A and B, the method comprising: splitting the fused multiply-accumulate operation into first and second multiply-accumulate sub-operations to be performed, respectively, by first and second instruction execution units; in the first multiply-accumulate sub-operation, selectively either accumulating partial products of A and B with C, or accumulating only the partial products of A and B, and generating therefrom an unrounded nonredundant sum; forwarding a plurality of calculation control indicators from a first instruction execution unit to a second instruction execution unit, wherein the calculation control indicators indicate how subsequent calculations in the second multiply-accumulate sub-operation should proceed, including whether an accumulation with C occurred in the first multiply-accumulate sub-operation; in the second multiply-accumulate sub-operation, accumulating C with the unrounded nonredundant sum if the first multiply-accumulate sub-operation produced the unrounded nonredundant sum without accumulating C; and in the second multiply-accumulate sub-operation, generating a final rounded result of the fused multiply-accumulate operation. 8. The method of claim 7 , wherein the calculation control indicators include indicators for generating an arithmetically correct rounded result from the unrounded nonredundant sum. 9. The method of claim 7 , wherein each of the first and second instruction execution units his operable to perform an operation distinct from the first and second multiply-accumulate sub-operations, while the other of the first and second execution units is performing a first or second multiply-accumulate sub-operation. 10. The method of claim 7 , wherein the first and second instruction execution units comprise, respectively, a multiplier configured to perform the first multiply-accumulate sub-operation and an adder configured to perform the second multiply-accumulate sub-operation. 11. A microprocessor operable to perform a fused multiply-accumulate operation of a form ±A*B ±C, wherein A, B and C are input operands, and wherein no rounding occurs before C is accumulated to a product of A and B, the microprocessor comprising: one or more instruction execution units configured to perform first and second multiply-accumulate sub-operations of a fused multiply-accumulate operation; and memory external to the one or more instruction execution units for storing the unrounded nonredundant sum generated by the first multiply-accumulate sub-operation; wherein in the first multiply-accumulate sub-operation, a selective accumulation is made of either the partial products of A and B with C, or of the partial products of A and B alone, and in accordance with which selective accumulation the unrounded nonredundant sum is generated; wherein in the second multiply-accumulate sub-operation, C is conditionally accumulated with the unrounded nonredundant sum if the first multiply-accumulate sub-operation produced the unrounded nonredundant sum without accumulating C; and wherein in the second multiply-accumulate sub-operation, a final rounded result of the fused multiply-accumulate operation is generated from the unrounded nonredundant sum conditionally accumulated with C; wherein the memory is configured to store the unrounded nonredundant sum for an indefinite period of time until the second multiply-accumulate sub-operation is begun, thereby enabling the one or more instruction execution units to perform other operations unrelated to the multiply-accumulate operation between the first and second multiply-accumulate sub-operations. 12. The microprocessor of claim 11 , wherein the one or more instruction execution units comprise at least first and second instruction execution units. 13. The microprocessor of claim 11 , wherein the memory comprises a result store for storing the unrounded nonredundant sum and a calculation control indicator store, distinct from the result store, that stores a plurality of calculation control indicators that indicate how subsequent calculations in the second multiply-accumulate sub-operation should proceed. 14. The microprocessor of claim 11 , wherein the calculation control indicators include an indication of whether an accumulation with C occurred in the first multiply-accumulate sub-operation. 15. The microprocessor of claim 11 , wherein the calculation control indicators include indicators for generating an arithmetically correct rounded result from the unrounded nonredundant sum. 16. The microprocessor of claim 11 , wherein the result store is coupled to a result bus coupled to a reorder buffer, the result bus being common to the one or more instruction execution units. 17. The microprocessor of claim 11 , wherein the calculation control indicator store is a cache that

Assignees

Inventors

Classifications

  • according to one or more bits in the instruction, e.g. prefix, sub-opcode · CPC title

  • Instruction analysis, e.g. decoding, instruction word fields · CPC title

  • controlled in tandem, e.g. multiplier-accumulator · CPC title

  • Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title

  • Implementation of IEEE-754 Standard · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9778908B2 cover?
A microprocessor splits a fused multiply-accumulate operation of the form A*B+C into first and second multiply-accumulate sub-operations to be performed by a multiplier and an adder. The first sub-operation at least multiplies A and B, and conditionally also accumulates C to the partial products of A and B to generate an unrounded nonredundant sum. The unrounded nonredundant sum is stored in me…
Who is the assignee on this patent?
Via Alliance Semiconductor Co Ltd, Via Alliance Semiconductor Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F7/483. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 03 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).