Outer product-based matrix-vector multiplication operation apparatus for accelerating vector operation and method using the same
US-2024362297-A1 · Oct 31, 2024 · US
US9146901B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9146901-B2 |
| Application number | US-201113137576-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 26, 2011 |
| Priority date | Sep 24, 2010 |
| Publication date | Sep 29, 2015 |
| Grant date | Sep 29, 2015 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processing apparatus is provided with processing circuitry 6, 8 and decoder circuitry 10 responsive to a received argument reduction instruction FREDUCE4, FDOT3R to generate control signals 16 for controlling the processing circuitry 6, 8 . The action of the argument reduction instruction is to subject each component of an input vector to a scaling which adds or subtracts an exponent shift value C to the exponent of the input vector component. The exponent shift value C is selected such that a sum of this exponent shift value C with the maximum exponent value B of any of the input vector components lies within a range between a first predetermined value and a second predetermined value. A consequence of execution of this argument reduction instruction is that the result vector when subject to a dot-product operation will be resistant to floating point underflows or overflows.
Opening claim text (preview).
I claim: 1. Apparatus for processing data comprising: processing circuitry configured to perform processing operations upon data values; and decoder circuitry coupled to said processing circuitry and configured to decode program instructions to generate control signals for controlling said processing circuitry to perform processing operations specified by said program instructions; wherein said decoder circuitry is responsive to an argument reduction instruction to generate control signals to control said processing circuitry to perform a processing operation upon a vector floating point value having a plurality of components, each of said plurality of components including an integer exponent value and a mantissa value, said processing operation including generating a plurality of result components, the processing operation comprising: for each of said plurality of components, forming a high order exponent portion E ho being an uppermost P bits of said integer exponent value, where P is less than a total number of bits within said integer exponent value, and selecting a highest value E homax from among said high order exponent portions E ho , wherein E homax identifies a highest integer exponent value B of said plurality of components; selecting an exponent shift value C such that (B+C) is less than a first predetermined value E dotmax and (B+C) is greater than a second predetermined value E dotmin , where said exponent shift value C is an integer value; and for each of said plurality of components, if said exponent shift value C is non-zero, then adding a value of (2 (P−1) −E homax ) to said high order exponent portion E ho to generate one of said plurality of result components. 2. Apparatus as claimed in claim 1 , wherein said first predetermined value E dotmax is a lowest integer value where a square of a floating point value with an integer exponent value of E dotmax and a mantissa M produces a floating point overflows for at least one value of M. 3. Apparatus as claimed in claim 2 , wherein each component has a sign value S c , an integer exponent value E c and a mantissa value M c representing a floating point number (−1) S c *2 (E c −127 )*(1+(M c /2 24 )) and E dotmax is 190. 4. Apparatus as claimed in claim 1 , wherein said second predetermined value E dotmin is a highest integer value where a square of a floating point value with an integer exponent value of E dotmin and a mantissa M produces a floating point underflows for at least one value of M. 5. Apparatus as claimed in claim 4 , wherein each component has a sign value S c , an integer exponent value E c and a mantissa value M c representing a floating point number (−1) S c *2 (E c −127 )*(1+(M c /2 24 )) and E dotmin is 64. 6. Apparatus as claimed in claim 1 , wherein for any one of said plurality of components, if when adding said exponent shift value C to an integer exponent value of said component to generate one of said plurality of result components, said one of said plurality of result components is subject to a floating point underflow, then replacing said one of said plurality of result components with a value of zero. 7. Apparatus as claimed in claim 1 , wherein for any one of said plurality of components, if when adding said value of (2 (P−1) −E homax ) to said high order exponent portion E ho , said value of (2 (P−1) −E homax ) is negative and said adding underflows, then replacing a corresponding one of said plurality of result components with a value of zero. 8. Apparatus as claimed in claim 1 , wherein a total number of bits within said integer exponent value is 8 and P=3. 9. Apparatus as claimed in claim 1 , wherein if any of said plurality of components is a floating point not-a-number, then all of said plurality of result components are set be floating point not-a-numbers. 10. Apparatus as claimed in claim 1 , wherein if any of said plurality of components is a floating point infinity value, then each result component corresponding to a component with a float point infinity value is set to a floating point value with magnitude one and a sign matching said floating point infinity value of said component and all remaining result components are set to have a floating point value with magnitude zero. 11. Apparatus as claimed in claim 1 , wherein said argument reduction instruction also generates a result scalar product with a value the same as given by a scalar product of said plurality of result components. 12. Apparatus as claimed in claim 1 , wherein said processing circuitry and said decoder circuitry are responsive to said argument reduction instruction followed by a sequence of one or more further instructions to generate a normalised vector floating point value with a plurality of normalised components the same as given by: generating a result scalar product with a value the same as given by a scalar product of said plurality of result components; generating a reciprocal square root of said result scalar product; and for each result component, generating a corresponding normalised component by multiplying said result component by said reciprocal square root. 13. Apparatus as claimed in claim 1 , wherein said processing circuitry and said decoder circuitry are part of a graphics processing unit. 14. A virtual machine comprising computer including a non-transitory computer readable storage medium storing a program which, when implement by the computer, provides an apparatus for processing data as claimed in claim 1 . 15. Apparatus for processing data comprising: processing means for performing processing operations upon data values; and decoder means for decoding program instructions to generate control signals for controlling said processing circuitry to perform processing operations specified by said program instructions; wherein said decoder means is responsive to an argument reduction instruction to generate control signals to control said processing means to perform a processing operation upon a vector floating point value having a plurality of components, each of said plurality of components including an integer exponent value and a mantissa value, said processing operation including generating a plurality of result components, the processing operation comprising: for each of said plurality of components, forming a high order exponent portion E ho being an uppermost P bits of said integer exponent value, where P is less than a total number of bits within said integer exponent value, and selecting a highest value E homax from among said high order exponent portions E ho , wherein E homax identifies a highest integer exponent value B of said plurality of components; selecting an exponent shift value C such that (B+C) is less than a first predetermined value E dotmax and (B+C) is greater than a second predetermined value E dotmin , where said exponent shift value C is an integer value; and for each of said plurality of components, if said exponent shift value C is non-zero, then adding a value of (2 (P−1) −E homax ) to said high order exponent portion E ho to generate one of said plurality of result components. 16. A method of processing data comprising the step of: in response to decoding an argument reduction instruction by decoding circuitry, performing, by processing circuitry, a processing operation upon a vector floating point value having a plurality of components, each of said plurality of components including an integer exponent value and a mantissa value, said processing operation including generating a plurality of result components, the pr
Roots or inverse roots of single operands · CPC title
Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers {(G06F7/4806, G06F7/4824, G06F7/49, G06F7/491, G06F7/544 take precedence)} · CPC title
Complex mathematical operations {(function generation by table look-up G06F1/03; evaluation of elementary functions by calculation G06F7/544)} · CPC title
Inverse root of a number or a function, e.g. the reciprocal of a Pythagorean sum · CPC title
in floating-point computations · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.