Processor-based apparatus and method for processing bit streams using bit-oriented instructions through byte-oriented storage
US-9740484-B2 · Aug 22, 2017 · US
US10248417B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10248417-B2 |
| Application number | US-201715685312-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 24, 2017 |
| Priority date | Jun 27, 2017 |
| Publication date | Apr 2, 2019 |
| Grant date | Apr 2, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for calculating FP (Full Precision) and PP (Partial Precision) values, performed by an ID (Instruction Decode) unit, contains at least the following steps: decoding an instruction request from a compiler; executing a loop m times to generate m microinstructions for calculating first-type data, or n times to generate n microinstructions for calculating second-type data according to the instruction mode of the instruction request, thereby enabling ALGs (Arithmetic Logic Groups) to execute lanes of a thread. m is less than n and the precision of the first-type data is lower than the precision of the second-type data.
Opening claim text (preview).
What is claimed is: 1. A method for calculating FP (Full Precision) and PP (Partial Precision) values, performed by an ID (Instruction Decode) unit, the method comprising: decoding an instruction request from a compiler; and executing a loop m times to generate m microinstructions for calculating first-type data, or n times to generate n microinstructions for calculating second-type data according to an instruction mode of the instruction request, thereby enabling a plurality of ALGs (Arithmetic Logic Groups) to execute a plurality of lanes of a thread; wherein m is less than n and a precision of the first-type data is lower than a precision of the second-type data; wherein each ALG comprises: a first-type computation lane; and a plurality of second-type computation lanes, wherein when the instruction mode is a first mode, each of the first-type computation lane and the second-type computation lanes completes calculations for a set of the first-type data independently; and, when the instruction mode is a second mode, each of the second-type computation lanes calculates a portion of a set of the second-type data to generate a partial result and the first-type computation lane combines the partial results by the second-type computation lanes, outputs a combined result and uses the combined result to complete calculations for the set of the second-type data. 2. The method of claim 1 , wherein the instruction mode is stored in a MSB (Most Significant Bit) of the instruction request. 3. The method of claim 1 , wherein the first-type data is PP data in 24 bits, the second-type data is FP data in 32 bits, m is 2 and n is 8. 4. The method of claim 1 , wherein each ALG comprises: a group controller for instructing each of the first-type computation lane and the second-type computation lanes to operate in the first mode or the second mode according to a microinstruction type. 5. The method of claim 1 , wherein each of the first-type computation lane and the second-type computation lanes when operating in the first mode completes a calculation independently: dest=Src 0× Src 1+ Src 2, Src 0 , Src 1 and Src 2 represent the first-type data of three source memories, and dest represents the first-type data to be stored in a destination memory or output to a post-processing unit. 6. The method of claim 1 , wherein the first-type computation lane in coordination with the second-type computation lanes when operating in the second mode completes a calculation: dest=Src 0× Src 1+ Src 2, Src 0 , Src 1 and Src 2 represent the second-type data of three source memories, and dest represents the second-type data to be stored in a destination memory or output to a post-processing unit. 7. The method of claim 6 , wherein each of Src 0 , Src 1 and Src 2 comprises a 24-bit mantissa of a floating-point value and the second-type computation lanes comprises a first computation unit, a second computation unit and a third computation unit, wherein the first computation unit multiplies the 8 highest bits of a mantissa of Src 0 by the 16 lowest bits of a mantissa of Src 1 to generate a first result, the second computation unit multiplies the 16 lowest bits of the mantissa of Src 0 by the 8 highest bits of the mantissa of Src 1 to generate a second result, the third computation unit multiplies the 8 highest bits of the mantissa of Src 0 by the 8 highest bits of the mantissa of Src 1 to generate a third result, the first-type computation lane multiplies the 16 lowest bits of the mantissa of Src 0 by the 16 lowest bits of the mantissa of Src 1 to generate a fourth result, wherein the first-type computation lane left-shifts the third result by 16 bits, right-shifts the fourth result by 16 bits, calculates a sum of the first result, the second result, the shifted third result and the shifted fourth result to generate a mantissa of Src 0 ×Src 1 , wherein the first-type computation lane calculates an exponent of Src 0 ×Src 1 , adds the mantissa of Src 0 ×Src 1 to a mantissa of Src 2 to generate a mantissa of dest, and selects the greater of the exponent of Src 0 ×Src 1 and an exponent of Src 2 . 8. An apparatus for calculating FP (Full Precision) and PP (Partial Precision) values, comprising: a first-type computation lane; and a plurality of second-type computation lanes, coupled to the first-type computation lane, wherein each of the first-type computation lane and the second-type computation lanes when operating in a first mode completes calculations for a set of the first-type data independently; each of the second-type computation lanes when operating in a second mode calculates a portion of a set of the second-type data to generate a partial result; and the first-type computation lane when operating in the second mode combines the partial results by the second-type computation lanes and outputs a combined result and uses the combined result to complete calculations for the set of the second-type data. 9. The apparatus of claim 8 , wherein the first-type data is PP data in 24 bits and the second-type data is FP data in 32 bits. 10. The apparatus of claim 8 , comprising: a group controller, coupled to the first-type computation lane and the second-type computation lanes, for instructing each of the first-type computation lane and the second-type computation lanes to operate in the first mode or the second mode according to a microinstruction type. 11. The apparatus of claim 8 , wherein each of the first-type computation lane and the second-type computation lanes when operating in the first mode completes a calculation independently: dest=Src 0× Src 1+ Src 2, Src 0 , Src 1 and Src 2 represent the first-type data of three source memories, and dest represents the first-type data to be stored in a destination memory or output to a post-processing unit. 12. The apparatus of claim 8 , wherein the first-type computation lane in coordination with the second-type computation lanes when operating in the second mode completes a calculation: dest=Src 0× Src 1+ Src 2, Src 0 , Src 1 and Src 2 represent the second-type data of three source memories, and dest represents the second-type data to be stored in a destination memory or output to a post-processing unit. 13. The apparatus of claim 12 , wherein each of Src 0 , Src 1 and Src 2 comprises a 24-bit mantissa of a floating-point value and the second-type computation lanes comprises a first computation unit, a second computation unit and a third computation unit, wherein the first computation unit multiplies the 8 highest bits of a mantissa of Src 0 by the 16 lowest bits of a mantissa of Src 1 to generate a first result, the second computation unit multiplies the 16 lowest bits of the mantissa of Src 0 by the 8 highest bits of the mantissa of Src 1 to generate a second result, the third computation unit multiplies the 8 highest bits of the mantissa of Src 0 by the 8 highest bits of the mantissa of Src 1 to generate a third result, the first-type computation lane multiplies the 16 lowest bits of the mantissa of Src 0 by the 16 lowest bits of the mantissa of Src 1 to generate a fourth result, wherein the first-type computation lane left-shifts the third result by 16 bits, right-shifts the fourth result by 16 bits, calculates a sum of the first result, the second result, the shifted third result and the shifted fourth result to generate a mantissa of Src 0 ×Src 1 , wherein the first-type computation lane calculates an exponent of Src 0 ×Src 1 , adds the mantissa of Src 0 ×Src 1 to a mantissa of Src 2 to generate a mantissa of dest, and selects the greater of the exponent of Src 0 ×Src 1 a
organised in groups of units sharing resources, e.g. clusters · CPC title
with variable precision · CPC title
Significance control · CPC title
Processor architectures; Processor configuration, e.g. pipelining · CPC title
according to execution mode, e.g. mode flag · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.