Outer product-based matrix-vector multiplication operation apparatus for accelerating vector operation and method using the same
US-2024362297-A1 · Oct 31, 2024 · US
US9778908B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9778908-B2 |
| Application number | US-201514748870-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 24, 2015 |
| Priority date | Jul 2, 2014 |
| Publication date | Oct 3, 2017 |
| Grant date | Oct 3, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A microprocessor splits a fused multiply-accumulate operation of the form A*B+C into first and second multiply-accumulate sub-operations to be performed by a multiplier and an adder. The first sub-operation at least multiplies A and B, and conditionally also accumulates C to the partial products of A and B to generate an unrounded nonredundant sum. The unrounded nonredundant sum is stored in memory shared by the multiplier and adder for an indefinite time period, enabling the multiplier and adder to perform other operations unrelated to the multiply-accumulate operation. The second sub-operation conditionally accumulates C to the unrounded nonredundant sum if C is not already incorporated into the value, and then generates a final rounded result.
Opening claim text (preview).
The invention claimed is: 1. A method in a microprocessor for performing a fused multiply-accumulate operation of a form ±A*B±C, wherein A, B and C are input operands, and wherein no rounding occurs before C is accumulated to a product of A and B, the method comprising: splitting the fused multiply-accumulate operation into first and second multiply-accumulate sub-operations to be performed by one or more instruction execution units; in the first multiply-accumulate sub-operation, selectively either accumulating partial products of A and B with C, or accumulating only the partial products of A and B, and to generate therefrom an unrounded nonredundant sum; between the first and second multiply-accumulate sub-operations, storing the unrounded nonredundant sum in memory, enabling the one or more instruction execution units to perform other operations unrelated to the multiply-accumulate operation; wherein the memory is external to the one or more instruction execution units and comprises a result store for storing the unrounded nonredundant sum and a calculation control indicator store, distinct from the result store, that stores a plurality of calculation control indicators that indicate how subsequent calculations in the second multiply-accumulate sub-operation should proceed; in the second multiply-accumulate sub-operation, accumulating C with the unrounded nonredundant sum if the first multiply-accumulate sub-operation produced the unrounded nonredundant sum without accumulating C; and in the second multiply-accumulate sub-operation, generating a final rounded result of the fused multiply-accumulate operation. 2. The method of claim 1 , wherein the fused multiply-accumulate operation is performed by at least two instruction execution units. 3. The method of claim 1 , wherein the result store is coupled to a result bus, the result bus being common to the one or more instruction execution units. 4. The method of claim 1 , wherein the result store is a reorder buffer. 5. The method of claim 1 , wherein the calculation control indicator store is a cache that is not coupled to the result bus and that is shared only by execution units configured to perform the first or second multiply-accumulate sub-operation. 6. The method of claim 1 , wherein the one or more instruction execution units comprise a multiply-accumulate unit configured to perform the first multiply-accumulate sub-operation in response to a first multiply-accumulate instruction and to perform the second multiply-accumulate sub-operation in response to a second multiply-accumulate instruction. 7. A method in a microprocessor for performing a fused multiply-accumulate operation of a form ±A*B ±C, wherein A, B and C are input operands, and wherein no rounding occurs before C is accumulated to a product of A and B, the method comprising: splitting the fused multiply-accumulate operation into first and second multiply-accumulate sub-operations to be performed, respectively, by first and second instruction execution units; in the first multiply-accumulate sub-operation, selectively either accumulating partial products of A and B with C, or accumulating only the partial products of A and B, and generating therefrom an unrounded nonredundant sum; forwarding a plurality of calculation control indicators from a first instruction execution unit to a second instruction execution unit, wherein the calculation control indicators indicate how subsequent calculations in the second multiply-accumulate sub-operation should proceed, including whether an accumulation with C occurred in the first multiply-accumulate sub-operation; in the second multiply-accumulate sub-operation, accumulating C with the unrounded nonredundant sum if the first multiply-accumulate sub-operation produced the unrounded nonredundant sum without accumulating C; and in the second multiply-accumulate sub-operation, generating a final rounded result of the fused multiply-accumulate operation. 8. The method of claim 7 , wherein the calculation control indicators include indicators for generating an arithmetically correct rounded result from the unrounded nonredundant sum. 9. The method of claim 7 , wherein each of the first and second instruction execution units his operable to perform an operation distinct from the first and second multiply-accumulate sub-operations, while the other of the first and second execution units is performing a first or second multiply-accumulate sub-operation. 10. The method of claim 7 , wherein the first and second instruction execution units comprise, respectively, a multiplier configured to perform the first multiply-accumulate sub-operation and an adder configured to perform the second multiply-accumulate sub-operation. 11. A microprocessor operable to perform a fused multiply-accumulate operation of a form ±A*B ±C, wherein A, B and C are input operands, and wherein no rounding occurs before C is accumulated to a product of A and B, the microprocessor comprising: one or more instruction execution units configured to perform first and second multiply-accumulate sub-operations of a fused multiply-accumulate operation; and memory external to the one or more instruction execution units for storing the unrounded nonredundant sum generated by the first multiply-accumulate sub-operation; wherein in the first multiply-accumulate sub-operation, a selective accumulation is made of either the partial products of A and B with C, or of the partial products of A and B alone, and in accordance with which selective accumulation the unrounded nonredundant sum is generated; wherein in the second multiply-accumulate sub-operation, C is conditionally accumulated with the unrounded nonredundant sum if the first multiply-accumulate sub-operation produced the unrounded nonredundant sum without accumulating C; and wherein in the second multiply-accumulate sub-operation, a final rounded result of the fused multiply-accumulate operation is generated from the unrounded nonredundant sum conditionally accumulated with C; wherein the memory is configured to store the unrounded nonredundant sum for an indefinite period of time until the second multiply-accumulate sub-operation is begun, thereby enabling the one or more instruction execution units to perform other operations unrelated to the multiply-accumulate operation between the first and second multiply-accumulate sub-operations. 12. The microprocessor of claim 11 , wherein the one or more instruction execution units comprise at least first and second instruction execution units. 13. The microprocessor of claim 11 , wherein the memory comprises a result store for storing the unrounded nonredundant sum and a calculation control indicator store, distinct from the result store, that stores a plurality of calculation control indicators that indicate how subsequent calculations in the second multiply-accumulate sub-operation should proceed. 14. The microprocessor of claim 11 , wherein the calculation control indicators include an indication of whether an accumulation with C occurred in the first multiply-accumulate sub-operation. 15. The microprocessor of claim 11 , wherein the calculation control indicators include indicators for generating an arithmetically correct rounded result from the unrounded nonredundant sum. 16. The microprocessor of claim 11 , wherein the result store is coupled to a result bus coupled to a reorder buffer, the result bus being common to the one or more instruction execution units. 17. The microprocessor of claim 11 , wherein the calculation control indicator store is a cache that
according to one or more bits in the instruction, e.g. prefix, sub-opcode · CPC title
Instruction analysis, e.g. decoding, instruction word fields · CPC title
controlled in tandem, e.g. multiplier-accumulator · CPC title
Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title
Implementation of IEEE-754 Standard · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.