Methods and apparatus for performing floating point operations
US-9904512-B1 · Feb 27, 2018 · US
US10983756B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10983756-B2 |
| Application number | US-201414516643-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 17, 2014 |
| Priority date | Oct 17, 2014 |
| Publication date | Apr 20, 2021 |
| Grant date | Apr 20, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.
Opening claim text (preview).
I claim: 1. An apparatus for executing an instruction requesting an arithmetic operation on an operand by an input value by performing an iterative arithmetic operation on said input value to obtain an output value that is used to represent a result of executing said instruction, the apparatus comprising: initial approximation circuitry configured in hardware to provide, from at least a portion of said input value, an initial approximation of said output value, the initial approximation of the output value having a second number of bits of precision that is less than a first number of bits of precision to which said output value is to be produced; limited precision multiplier circuitry configured in hardware to receive the initial approximation and multiply the initial approximation with another value to obtain a first multiplication result; full-precision multiplier circuitry coupled to receive the first multiplication result from the limited precision multiplier circuitry and configured in hardware to multiply the first multiplication result from the limited precision multiplier circuitry with another value to obtain a second multiplication result on which said output value is based, the second multiplication result having no fewer than the first number of bits of precision, wherein the full-precision multiplier circuitry requires a first number of clock cycles to finish its multiplication, and a combined number of clock cycles required by the initial approximation circuitry to provide the initial approximation and the limited precision multiplier circuitry to complete a multiplication is equal to or less than the first number of clock cycles; and an output configured in hardware to output said output value representing the result of executing said instruction. 2. The apparatus of claim 1 , wherein a mantissa for the output value is obtained by conducting an iterative refinement of the initial approximation, wherein a first multiplication in the iterative refinement is conducted by the limited precision multiplier circuitry and subsequent multiplications in the iterative refinement are conducted by the full-precision multiplier circuitry. 3. The apparatus of claim 1 , wherein the initial approximation circuitry comprises a LookUp Table (LUT) configured to receive at least a portion of bits of the input value and to output a set of bits from which the initial approximation can be constructed. 4. The apparatus of claim 1 , wherein the full-precision multiplier circuitry is configured to perform a double-precision multiplication between two mantissas. 5. The apparatus of claim 1 , wherein the apparatus is configured to provide a value for a division of a dividend a by a divisor b, and the initial approximation circuitry is configured to produce the initial approximation as an initial approximation of a reciprocal of the divisor b. 6. The apparatus of claim 1 , wherein the apparatus is configured to provide a value for a square root of a value b and the initial approximation circuitry is configured to produce the initial approximation as an initial approximation of a reciprocal of the square root of b. 7. An apparatus for executing an instruction requesting an arithmetic operation on an operand by an input value by performing an iterative arithmetic operation on said input value to obtain an output value that is used to represent a result of executing said instruction, the apparatus comprising: initial approximation circuitry configured in hardware to provide, from at least a portion of said input value, an initial approximation of said output value, the initial approximation of the output value having a second number of bits of precision that is less than a first number of bits of precision to which said output value is to be produced; limited precision multiplier circuitry configured in hardware to receive the initial approximation and multiply the initial approximation with another value to obtain a first multiplication result; full-precision multiplier circuitry coupled to receive the first multiplication result from the limited precision multiplier circuitry and configured in hardware to multiply the first multiplication result from the limited precision multiplier circuitry with another value to obtain a second multiplication result on which said output value is based, the second multiplication result having no fewer than the first number of bits of precision, wherein the full-precision multiplier circuitry requires a first number of clock cycles to finish its multiplication, a combined number of clock cycles required by the initial approximation circuitry to provide the initial approximation and the limited precision multiplier circuitry to complete a multiplication is equal to or less than the first number of clock cycles; an output configured in hardware to output said output value representing the result of executing said instruction; and wherein the full-precision multiplier circuitry is further configured to feed the second multiplication result back as an input to the full-precision multiplier circuitry to iteratively refine said output value. 8. The apparatus of claim 7 , wherein a mantissa for the output value is obtained by conducting an iterative refinement of the initial approximation, wherein a first multiplication in the iterative refinement is conducted by the limited precision multiplier circuitry and subsequent multiplications in the iterative refinement are conducted by the full-precision multiplier circuitry. 9. The apparatus of claim 7 , wherein the initial approximation circuitry comprises a LookUp Table (LUT) configured to receive at least a portion of bits of the input value and to output a set of bits from which the initial approximation can be constructed. 10. The apparatus of claim 7 , wherein the full-precision multiplier circuitry is configured to perform a double-precision multiplication between two mantissas. 11. The apparatus of claim 7 , wherein the apparatus is configured to provide a value for a division of a dividend a by a divisor b, and the initial approximation circuitry is configured to produce the initial approximation as an initial approximation of a reciprocal of the divisor b. 12. The apparatus of claim 7 , wherein the apparatus is configured to provide a value for a square root of a value b and the initial approximation circuitry is configured to produce the initial approximation as an initial approximation of a reciprocal of the square root of b.
Reduction of the number of iteration steps or stages, e.g. using the Sweeny-Robertson-Tocher [SRT] algorithm · CPC title
Multiplicative non-restoring division, e.g. SRT, using multiplication in quotient selection · CPC title
Roots or inverse roots of single operands · CPC title
Dividing · CPC title
Multiplying only · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.