Arithmetic processing apparatus, control method, and recording medium
US-2019377548-A1 · Dec 12, 2019 · US
US2016188295A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016188295-A1 |
| Application number | US-201414584948-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 29, 2014 |
| Priority date | Dec 29, 2014 |
| Publication date | Jun 30, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments disclosed pertain to apparatuses, systems, and methods for performing multi-precision single instruction multiple data (SIMD) operations on integer, fixed point and floating point operands. Disclosed embodiments pertain to a circuit that is capable of performing concurrent multiply, fused multiply-add, rounding, saturation, and dot products on the above operand types. In addition, the circuit may facilitate 64-bit multiplication when Newton-Raphson, divide and square root operations are performed.
Opening claim text (preview).
What is claimed is: 1 . A multi-precision Single Instruction Multiple Data (SIMD) multiply unit comprising: a carry save adder (CSA) configured to obtain based, in part, on a plurality of partial products of a first multiplier operand and a second modified booth encoded multiplicand operand, a first partial result and a second partial result; and an addition module coupled to the CSA, the addition module comprising: a full adder to obtain an intermediate sum result and an intermediate carry result by adding the first partial result and second partial result to a third operand, and a first carry lookahead adder (CLA) to operate on integer and fixed point operands and coupled to the full adder, the first CLA to add the intermediate sum result and the intermediate carry result, wherein the first CLA comprises, in addition to columns for bits in the intermediate sum result and intermediate carry result, one or more additional columns, wherein each additional column comprises bit values that: prevent carry propagation across the additional column, or propagate carries across the additional column, wherein, a determination to propagate carries, or to prevent carry propagation across each of the one or more additional columns is based, in part, on a current instruction being executed by the arithmetic unit, a number of concurrent operations specified in the current instruction, and a precision of the current instruction. 2 . The multiply unit of claim 1 , wherein the multiply unit comprises: a modified booth encoder coupled to the CSA, the modified booth encoder configured to obtain, from the first multiplier operand and the second modified booth encoded multiplicand operand, rows corresponding to each of the plurality of partial products, wherein each partial product row is offset from an immediately preceding partial product row by two bits; and wherein the CSA obtains the first partial result and the second partial result based, in part, on the plurality of partial product rows. 3 . The multiply unit of claim 2 , wherein the addition module further comprises: a second multi-precision CLA to operate on floating point operands, the second CLA coupled to the CSA and configured to add the first partial result and second partial result. 4 . The multiply unit of claim 3 , wherein: the first CLA is 135-bits wide and the one or more additional columns comprise 7 additional columns, the second CLA is 128-bits wide, and based on the current instruction, and, in part, by varying bit values in each of the one or more additional columns, the arithmetic unit is configurable to execute one or more of: up to 8 concurrent multiplications or fused multiply-adds of 8-bit signed or unsigned integers; or up to 4 concurrent multiplications or fused multiply-adds of 16-bit signed or unsigned integers; or up to 2 concurrent multiplication s or fused multiply adds of 32-bit signed or unsigned integers; or a 64 bit signed or unsigned multiplication or fused multiply add of integers; or a 64 bit multiplication of floating point mantissas for iterative computations of one or more of: divide, or square root; or up to 4 concurrent multiplications or fused multiply-adds of 16-bit fixed point numbers; or up to 2 concurrent multiplications or fused multiply-adds of 32-bit fixed point numbers; or up to 2 concurrent multiplications of 24-bit floating point mantissas; or a multiplication of 53-bit floating point mantissas. 5 . The multiply unit of claim 2 , wherein the CSA comprises five levels of compressors. 6 . The multiply unit of claim 5 , wherein the five levels of compressors comprise: a first level comprises a plurality of 5:3 compressors, wherein each 5:3 compressor receives five inputs from five corresponding partial product rows, wherein three of the five inputs arrive from a first column of bits in three consecutive rows of the five partial product rows and two of the five inputs arrive from a second column of bits in two rows of the five partial product rows, wherein the first and second columns are columns of bits; a second level comprises a plurality of 3:2 compressors, each 3:2 compressor coupled to a plurality of 5:3 compressors at the first level; and three levels of 4:2 compressors. 7 . The multiply unit of claim 5 , further comprising: a plurality of multiplexers wherein, based, in part, on the precision of the current instruction, the multiplexers are coupled to select the output of one or more of the five levels of compressors, and configured to select as the first partial result and the second partial result, the outputs of: a second level of the five levels of compressors, or a third level of the five levels compressors, or a fourth level of the five levels compressors, or a fifth level of the five levels compressors, and wherein the outputs selected as the first partial result and second partial result are routed to the full adder. 8 . A processor comprising an Arithmetic Logic Unit (ALU), wherein the ALU further comprises a multi-precision Single Instruction Multiple Data (SIMD) multiply unit comprising: a carry save adder (CSA) configured to obtain based, in part, on a plurality of partial products of a first multiplier operand and a second modified booth encoded multiplicand operand, a first partial result and a second partial result; and an addition module coupled to the CSA, the addition module comprising: a full adder to obtain an intermediate sum result and an intermediate carry result by adding the first partial result and second partial result to a third operand, and a first carry lookahead adder (CLA) to operate on integer and fixed point operands and coupled to the full adder, the first CLA to add the intermediate sum result and the intermediate carry result, wherein the first CLA comprises, in addition to columns for bits in the intermediate sum result and intermediate carry result, one or more additional columns, wherein each additional column comprises bit values that: prevent carry propagation across the additional column, or propagate carries across the additional column, wherein, a determination to propagate carries, or to prevent carry propagation across each of the one or more additional columns is based, in part, on a current instruction being executed by the arithmetic unit, a number of concurrent operations specified in the current instruction, and a precision of the current instruction. 9 . The processor of claim 8 , wherein the multiply unit comprises: a modified booth encoder coupled to the CSA, the modified booth encoder configured to obtain, from the first multiplier operand and the second modified booth encoded multiplicand operand, rows corresponding to each of the plurality of partial products, wherein each partial product row is offset from an immediately preceding partial product row by two bits; and wherein the CSA obtains the first partial result and the second partial result based, in part, on the plurality of partial product rows. 10 . The processor of claim 9 , wherein the addition module further comprises: a second multi-precision CLA to operate on floating point operands, the second CLA coupled to the CSA and configured to add the first partial result and second partial result. 11 . The processor of claim 10 , wherein: the first CLA is 135-bits wide and the one or more additional columns comprise 7 additional columns, the second CLA is 128-bits wide, and based on the current instruction, and, in part, by varying bit values in each of the one or more additional columns, the arithmetic unit is configurable to execute one or more of: up to 8 concurrent multiplications or fused multiply-ad
Multiplying; Dividing {(G06F7/4833, G06F7/4836 take precedence)} · CPC title
Multiplying · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.