Standard format intermediate result

US9798519B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9798519-B2
Application numberUS-201514749002-A
CountryUS
Kind codeB2
Filing dateJun 24, 2015
Priority dateJul 2, 2014
Publication dateOct 24, 2017
Grant dateOct 24, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A microprocessor comprises an instruction pipeline, a shared memory, and first and second arithmetic processing units in the instruction pipeline, each capable of reading or receiving operands from and writing or providing results to the shared memory. The first arithmetic processing unit performs a first portion of a mathematical operation to produce an intermediate result vector that is not a complete, final result of the mathematical operation. The first arithmetic processing unit generates a plurality of non-architectural calculation control indicators that indicate how subsequent calculations to generate a final result from the intermediate result vector should proceed. The second arithmetic processing unit performs a second portion of the mathematical operation, in accordance with the calculation control indicators, to produce a complete, final result of the mathematical operation.

First claim

Opening claim text (preview).

The invention claimed is: 1. A microprocessor comprising: an instruction pipeline; a shared memory; and first and second instruction execution units in the instruction pipeline, each configured to decode machine level instructions, read operands from the shared memory, and write results to the shared memory; the first instruction execution unit performing a first portion of a mathematical operation to produce an intermediate result vector that is not a complete, final result of the mathematical operation; the first instruction execution unit generating a plurality of non-architectural calculation control indicators that indicate how subsequent calculations to generate a final result from the intermediate result vector should proceed; and the second instruction execution unit performing a second portion of the mathematical operation, in accordance with the calculation control indicators, to produce a complete, final result of the mathematical operation. 2. The microprocessor of claim 1 , wherein the intermediate result vector is an unrounded value and the complete, final result is a rounded value. 3. The microprocessor of claim 1 , wherein the mathematical operation is a fused floating-point multiply-accumulate (FMA) operation of a form ±A*B±C, where A, B and C are floating point input operands, wherein no rounding occurs before C is accumulated to a product of A and B. 4. The microprocessor of claim 1 , wherein the mathematical operation is a multiply-accumulate operation, and the microprocessor further comprises a translator or ROM that transforms an atomic, unified multiply-accumulate instruction into at least first and second micro-instructions, wherein execution of the first micro-instruction generates the intermediate result vector, and execution of the second micro-instruction uses the intermediate result vector to generate the complete, final result. 5. The microprocessor of claim 1 , wherein the calculation control indicators comprise rounding indicators that provide sufficient information to enable the second instruction execution unit to produce a correctly rounded complete, final result after performing the second portion of the mathematical operation on the intermediate result vector. 6. The microprocessor of claim 1 , wherein the first instruction execution unit stores the intermediate result vector in a register and the calculation control indicators in a calculation control indicator cache, and the second instruction execution unit loads the intermediate result vector from the register and the calculation control indicators from the calculation control indicator cache. 7. The microprocessor of claim 1 , wherein the microprocessor forwards the intermediate result vector to the second instruction execution unit. 8. The microprocessor of claim 1 , wherein the first portion of the mathematical operation comprises at least a multiplication of two input operands. 9. The microprocessor of claim 8 , wherein whether values of the first two input operands and the third operand satisfy at least one of a set of one or more conditions dictates which of the first and second portions of the mathematical operation further comprises an accumulation with a third operand. 10. The microprocessor of claim 1 , wherein the second portion of the mathematical operation comprises at least a rounding sub-operation. 11. The microprocessor of claim 1 , wherein either the first or second portion of the mathematical operation further comprises an accumulation sub-operation. 12. A method in a microprocessor of performing a fused multiply-accumulate operation of a form ±A*B±C, wherein A, B, and C are input operands, the method comprising: calculating partial products of operands A and B; generating a selection indicator that indicates which of a first and second set of addends are selected to be accumulated in a first accumulation, the first set of addends being the partial products of operands A and B alone and the second set of addends being operand C and the partial products of operands A and B; performing the first accumulation in accordance with the selection indicator to generate a germinal result; selecting a plurality of most significant bits of the germinal result to incorporate into an unrounded intermediate result vector; reducing a plurality of least significant bits to one or more bits incorporated into a set of rounding indicators, wherein the selection indicator is one of the set of rounding indicators; if the first accumulation did not include operand C, then performing a second accumulation of operand C with the unrounded intermediate result vector; and generating a final rounded result of the multiply-accumulate operation using the set of rounding indicators. 13. The method of claim 12 , further comprising storing the set of rounding indicators in a cache. 14. The method of claim 13 , wherein the set of rounding indicators comprise guard (G), round (R), and/or sticky (S) bits. 15. The method of claim 12 , further comprising storing the unrounded intermediate result vector in shared storage accessible by a plurality of instruction execution units. 16. The method of claim 12 , further comprising generating an end-around-carry value (E) to indicate that an end around carry correction is pending, if the first accumulation included operand C and the unrounded intermediate result vector was positive. 17. The method of claim 12 , wherein the unrounded intermediate result vector comprises an intermediate mantissa result and an intermediate result exponent value (IRExp), wherein IRExp is a normalized representation of the larger of an exponent of C and a function of a sum of the exponent values of operands A and B. 18. The method of claim 17 , wherein the unrounded intermediate result vector also includes an intermediate sign indicator generated as a function of whether the first accumulation included operand C, the multiply-accumulate operation is an effective subtraction, and no end-around-carry is pending. 19. The method of claim 17 , further comprising generating intermediate underflow (U) and intermediate overflow (O) indications to indicate whether IRExp is above or below a range of representable or desirable exponent values. 20. The method of claim 12 , further comprising generating an intermediate sign indicator as a function of whether the first accumulation included operand C, the multiply-accumulate operation is an effective subtraction, and no end-around-carry is pending. 21. The method of claim 12 , further comprising aligning a selectively complemented mantissa of operand C within a multiplier unit partial product summation tree. 22. The method of claim 12 , wherein the unrounded intermediate result vector is represented in a standard IEEE data format for a floating point number, including a mantissa having a number of bits equal to a number of bits of a mantissa of a target result of the fused multiply-accumulate operation. 23. The method of claim 12 , further comprising generating one or more additional rounding indicators to incorporate into the set of rounding indicators prior to performing any second accumulation for use in generating the final rounded result. 24. The method of claim 23 , wherein one of the one or more additional rounding indicators is an end-around-carry value (E) to indicate that an end around carry correction is pending and using the end-around-carry value (E). 25. The method of claim

Assignees

Inventors

Classifications

  • Implementation of IEEE-754 Standard · CPC title

  • Adding; Subtracting {(G06F7/4833, G06F7/4836 take precedence)} · CPC title

  • Runtime instruction translation, e.g. macros · CPC title

  • Denomination or exception handling, e.g. rounding or overflow · CPC title

  • Instruction analysis, e.g. decoding, instruction word fields · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9798519B2 cover?
A microprocessor comprises an instruction pipeline, a shared memory, and first and second arithmetic processing units in the instruction pipeline, each capable of reading or receiving operands from and writing or providing results to the shared memory. The first arithmetic processing unit performs a first portion of a mathematical operation to produce an intermediate result vector that is not a…
Who is the assignee on this patent?
Via Alliance Semiconductor Co Ltd, Via Alliance Semiconductor Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F7/483. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 24 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).