Unified multiply unit

US2016188295A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016188295-A1
Application numberUS-201414584948-A
CountryUS
Kind codeA1
Filing dateDec 29, 2014
Priority dateDec 29, 2014
Publication dateJun 30, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments disclosed pertain to apparatuses, systems, and methods for performing multi-precision single instruction multiple data (SIMD) operations on integer, fixed point and floating point operands. Disclosed embodiments pertain to a circuit that is capable of performing concurrent multiply, fused multiply-add, rounding, saturation, and dot products on the above operand types. In addition, the circuit may facilitate 64-bit multiplication when Newton-Raphson, divide and square root operations are performed.

First claim

Opening claim text (preview).

What is claimed is: 1 . A multi-precision Single Instruction Multiple Data (SIMD) multiply unit comprising: a carry save adder (CSA) configured to obtain based, in part, on a plurality of partial products of a first multiplier operand and a second modified booth encoded multiplicand operand, a first partial result and a second partial result; and an addition module coupled to the CSA, the addition module comprising: a full adder to obtain an intermediate sum result and an intermediate carry result by adding the first partial result and second partial result to a third operand, and a first carry lookahead adder (CLA) to operate on integer and fixed point operands and coupled to the full adder, the first CLA to add the intermediate sum result and the intermediate carry result, wherein the first CLA comprises, in addition to columns for bits in the intermediate sum result and intermediate carry result, one or more additional columns, wherein each additional column comprises bit values that: prevent carry propagation across the additional column, or propagate carries across the additional column, wherein, a determination to propagate carries, or to prevent carry propagation across each of the one or more additional columns is based, in part, on a current instruction being executed by the arithmetic unit, a number of concurrent operations specified in the current instruction, and a precision of the current instruction. 2 . The multiply unit of claim 1 , wherein the multiply unit comprises: a modified booth encoder coupled to the CSA, the modified booth encoder configured to obtain, from the first multiplier operand and the second modified booth encoded multiplicand operand, rows corresponding to each of the plurality of partial products, wherein each partial product row is offset from an immediately preceding partial product row by two bits; and wherein the CSA obtains the first partial result and the second partial result based, in part, on the plurality of partial product rows. 3 . The multiply unit of claim 2 , wherein the addition module further comprises: a second multi-precision CLA to operate on floating point operands, the second CLA coupled to the CSA and configured to add the first partial result and second partial result. 4 . The multiply unit of claim 3 , wherein: the first CLA is 135-bits wide and the one or more additional columns comprise 7 additional columns, the second CLA is 128-bits wide, and based on the current instruction, and, in part, by varying bit values in each of the one or more additional columns, the arithmetic unit is configurable to execute one or more of: up to 8 concurrent multiplications or fused multiply-adds of 8-bit signed or unsigned integers; or up to 4 concurrent multiplications or fused multiply-adds of 16-bit signed or unsigned integers; or up to 2 concurrent multiplication s or fused multiply adds of 32-bit signed or unsigned integers; or a 64 bit signed or unsigned multiplication or fused multiply add of integers; or a 64 bit multiplication of floating point mantissas for iterative computations of one or more of: divide, or square root; or up to 4 concurrent multiplications or fused multiply-adds of 16-bit fixed point numbers; or up to 2 concurrent multiplications or fused multiply-adds of 32-bit fixed point numbers; or up to 2 concurrent multiplications of 24-bit floating point mantissas; or a multiplication of 53-bit floating point mantissas. 5 . The multiply unit of claim 2 , wherein the CSA comprises five levels of compressors. 6 . The multiply unit of claim 5 , wherein the five levels of compressors comprise: a first level comprises a plurality of 5:3 compressors, wherein each 5:3 compressor receives five inputs from five corresponding partial product rows, wherein three of the five inputs arrive from a first column of bits in three consecutive rows of the five partial product rows and two of the five inputs arrive from a second column of bits in two rows of the five partial product rows, wherein the first and second columns are columns of bits; a second level comprises a plurality of 3:2 compressors, each 3:2 compressor coupled to a plurality of 5:3 compressors at the first level; and three levels of 4:2 compressors. 7 . The multiply unit of claim 5 , further comprising: a plurality of multiplexers wherein, based, in part, on the precision of the current instruction, the multiplexers are coupled to select the output of one or more of the five levels of compressors, and configured to select as the first partial result and the second partial result, the outputs of: a second level of the five levels of compressors, or a third level of the five levels compressors, or a fourth level of the five levels compressors, or a fifth level of the five levels compressors, and wherein the outputs selected as the first partial result and second partial result are routed to the full adder. 8 . A processor comprising an Arithmetic Logic Unit (ALU), wherein the ALU further comprises a multi-precision Single Instruction Multiple Data (SIMD) multiply unit comprising: a carry save adder (CSA) configured to obtain based, in part, on a plurality of partial products of a first multiplier operand and a second modified booth encoded multiplicand operand, a first partial result and a second partial result; and an addition module coupled to the CSA, the addition module comprising: a full adder to obtain an intermediate sum result and an intermediate carry result by adding the first partial result and second partial result to a third operand, and a first carry lookahead adder (CLA) to operate on integer and fixed point operands and coupled to the full adder, the first CLA to add the intermediate sum result and the intermediate carry result, wherein the first CLA comprises, in addition to columns for bits in the intermediate sum result and intermediate carry result, one or more additional columns, wherein each additional column comprises bit values that: prevent carry propagation across the additional column, or propagate carries across the additional column, wherein, a determination to propagate carries, or to prevent carry propagation across each of the one or more additional columns is based, in part, on a current instruction being executed by the arithmetic unit, a number of concurrent operations specified in the current instruction, and a precision of the current instruction. 9 . The processor of claim 8 , wherein the multiply unit comprises: a modified booth encoder coupled to the CSA, the modified booth encoder configured to obtain, from the first multiplier operand and the second modified booth encoded multiplicand operand, rows corresponding to each of the plurality of partial products, wherein each partial product row is offset from an immediately preceding partial product row by two bits; and wherein the CSA obtains the first partial result and the second partial result based, in part, on the plurality of partial product rows. 10 . The processor of claim 9 , wherein the addition module further comprises: a second multi-precision CLA to operate on floating point operands, the second CLA coupled to the CSA and configured to add the first partial result and second partial result. 11 . The processor of claim 10 , wherein: the first CLA is 135-bits wide and the one or more additional columns comprise 7 additional columns, the second CLA is 128-bits wide, and based on the current instruction, and, in part, by varying bit values in each of the one or more additional columns, the arithmetic unit is configurable to execute one or more of: up to 8 concurrent multiplications or fused multiply-ad

Assignees

Inventors

Classifications

  • G06F7/487Primary

    Multiplying; Dividing {(G06F7/4833, G06F7/4836 take precedence)} · CPC title

  • G06F7/4876Primary

    Multiplying · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016188295A1 cover?
Embodiments disclosed pertain to apparatuses, systems, and methods for performing multi-precision single instruction multiple data (SIMD) operations on integer, fixed point and floating point operands. Disclosed embodiments pertain to a circuit that is capable of performing concurrent multiply, fused multiply-add, rounding, saturation, and dot products on the above operand types. In addition, t…
Who is the assignee on this patent?
Imagination Tech Ltd
What technology area does this patent fall under?
Primary CPC classification G06F7/487. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 30 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).