Methods and apparatuses for calculating FP (full precision) and PP (partial precision) values

US10248417B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10248417-B2
Application numberUS-201715685312-A
CountryUS
Kind codeB2
Filing dateAug 24, 2017
Priority dateJun 27, 2017
Publication dateApr 2, 2019
Grant dateApr 2, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for calculating FP (Full Precision) and PP (Partial Precision) values, performed by an ID (Instruction Decode) unit, contains at least the following steps: decoding an instruction request from a compiler; executing a loop m times to generate m microinstructions for calculating first-type data, or n times to generate n microinstructions for calculating second-type data according to the instruction mode of the instruction request, thereby enabling ALGs (Arithmetic Logic Groups) to execute lanes of a thread. m is less than n and the precision of the first-type data is lower than the precision of the second-type data.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for calculating FP (Full Precision) and PP (Partial Precision) values, performed by an ID (Instruction Decode) unit, the method comprising: decoding an instruction request from a compiler; and executing a loop m times to generate m microinstructions for calculating first-type data, or n times to generate n microinstructions for calculating second-type data according to an instruction mode of the instruction request, thereby enabling a plurality of ALGs (Arithmetic Logic Groups) to execute a plurality of lanes of a thread; wherein m is less than n and a precision of the first-type data is lower than a precision of the second-type data; wherein each ALG comprises: a first-type computation lane; and a plurality of second-type computation lanes, wherein when the instruction mode is a first mode, each of the first-type computation lane and the second-type computation lanes completes calculations for a set of the first-type data independently; and, when the instruction mode is a second mode, each of the second-type computation lanes calculates a portion of a set of the second-type data to generate a partial result and the first-type computation lane combines the partial results by the second-type computation lanes, outputs a combined result and uses the combined result to complete calculations for the set of the second-type data. 2. The method of claim 1 , wherein the instruction mode is stored in a MSB (Most Significant Bit) of the instruction request. 3. The method of claim 1 , wherein the first-type data is PP data in 24 bits, the second-type data is FP data in 32 bits, m is 2 and n is 8. 4. The method of claim 1 , wherein each ALG comprises: a group controller for instructing each of the first-type computation lane and the second-type computation lanes to operate in the first mode or the second mode according to a microinstruction type. 5. The method of claim 1 , wherein each of the first-type computation lane and the second-type computation lanes when operating in the first mode completes a calculation independently: dest=Src 0× Src 1+ Src 2, Src 0 , Src 1 and Src 2 represent the first-type data of three source memories, and dest represents the first-type data to be stored in a destination memory or output to a post-processing unit. 6. The method of claim 1 , wherein the first-type computation lane in coordination with the second-type computation lanes when operating in the second mode completes a calculation: dest=Src 0× Src 1+ Src 2, Src 0 , Src 1 and Src 2 represent the second-type data of three source memories, and dest represents the second-type data to be stored in a destination memory or output to a post-processing unit. 7. The method of claim 6 , wherein each of Src 0 , Src 1 and Src 2 comprises a 24-bit mantissa of a floating-point value and the second-type computation lanes comprises a first computation unit, a second computation unit and a third computation unit, wherein the first computation unit multiplies the 8 highest bits of a mantissa of Src 0 by the 16 lowest bits of a mantissa of Src 1 to generate a first result, the second computation unit multiplies the 16 lowest bits of the mantissa of Src 0 by the 8 highest bits of the mantissa of Src 1 to generate a second result, the third computation unit multiplies the 8 highest bits of the mantissa of Src 0 by the 8 highest bits of the mantissa of Src 1 to generate a third result, the first-type computation lane multiplies the 16 lowest bits of the mantissa of Src 0 by the 16 lowest bits of the mantissa of Src 1 to generate a fourth result, wherein the first-type computation lane left-shifts the third result by 16 bits, right-shifts the fourth result by 16 bits, calculates a sum of the first result, the second result, the shifted third result and the shifted fourth result to generate a mantissa of Src 0 ×Src 1 , wherein the first-type computation lane calculates an exponent of Src 0 ×Src 1 , adds the mantissa of Src 0 ×Src 1 to a mantissa of Src 2 to generate a mantissa of dest, and selects the greater of the exponent of Src 0 ×Src 1 and an exponent of Src 2 . 8. An apparatus for calculating FP (Full Precision) and PP (Partial Precision) values, comprising: a first-type computation lane; and a plurality of second-type computation lanes, coupled to the first-type computation lane, wherein each of the first-type computation lane and the second-type computation lanes when operating in a first mode completes calculations for a set of the first-type data independently; each of the second-type computation lanes when operating in a second mode calculates a portion of a set of the second-type data to generate a partial result; and the first-type computation lane when operating in the second mode combines the partial results by the second-type computation lanes and outputs a combined result and uses the combined result to complete calculations for the set of the second-type data. 9. The apparatus of claim 8 , wherein the first-type data is PP data in 24 bits and the second-type data is FP data in 32 bits. 10. The apparatus of claim 8 , comprising: a group controller, coupled to the first-type computation lane and the second-type computation lanes, for instructing each of the first-type computation lane and the second-type computation lanes to operate in the first mode or the second mode according to a microinstruction type. 11. The apparatus of claim 8 , wherein each of the first-type computation lane and the second-type computation lanes when operating in the first mode completes a calculation independently: dest=Src 0× Src 1+ Src 2, Src 0 , Src 1 and Src 2 represent the first-type data of three source memories, and dest represents the first-type data to be stored in a destination memory or output to a post-processing unit. 12. The apparatus of claim 8 , wherein the first-type computation lane in coordination with the second-type computation lanes when operating in the second mode completes a calculation: dest=Src 0× Src 1+ Src 2, Src 0 , Src 1 and Src 2 represent the second-type data of three source memories, and dest represents the second-type data to be stored in a destination memory or output to a post-processing unit. 13. The apparatus of claim 12 , wherein each of Src 0 , Src 1 and Src 2 comprises a 24-bit mantissa of a floating-point value and the second-type computation lanes comprises a first computation unit, a second computation unit and a third computation unit, wherein the first computation unit multiplies the 8 highest bits of a mantissa of Src 0 by the 16 lowest bits of a mantissa of Src 1 to generate a first result, the second computation unit multiplies the 16 lowest bits of the mantissa of Src 0 by the 8 highest bits of the mantissa of Src 1 to generate a second result, the third computation unit multiplies the 8 highest bits of the mantissa of Src 0 by the 8 highest bits of the mantissa of Src 1 to generate a third result, the first-type computation lane multiplies the 16 lowest bits of the mantissa of Src 0 by the 16 lowest bits of the mantissa of Src 1 to generate a fourth result, wherein the first-type computation lane left-shifts the third result by 16 bits, right-shifts the fourth result by 16 bits, calculates a sum of the first result, the second result, the shifted third result and the shifted fourth result to generate a mantissa of Src 0 ×Src 1 , wherein the first-type computation lane calculates an exponent of Src 0 ×Src 1 , adds the mantissa of Src 0 ×Src 1 to a mantissa of Src 2 to generate a mantissa of dest, and selects the greater of the exponent of Src 0 ×Src 1 a

Assignees

Inventors

Classifications

  • organised in groups of units sharing resources, e.g. clusters · CPC title

  • with variable precision · CPC title

  • Significance control · CPC title

  • Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • according to execution mode, e.g. mode flag · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10248417B2 cover?
A method for calculating FP (Full Precision) and PP (Partial Precision) values, performed by an ID (Instruction Decode) unit, contains at least the following steps: decoding an instruction request from a compiler; executing a loop m times to generate m microinstructions for calculating first-type data, or n times to generate n microinstructions for calculating second-type data according to the …
Who is the assignee on this patent?
Via Alliance Semiconductor Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F9/30014. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 02 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).