Variable precision floating point multiply-add circuit
US-9104474-B2 · Aug 11, 2015 · US
US11836464B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11836464-B2 |
| Application number | US-202217839905-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 14, 2022 |
| Priority date | Oct 15, 2018 |
| Publication date | Dec 5, 2023 |
| Grant date | Dec 5, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus and method for efficiently performing a multiply add or multiply accumulate operation. For example, one embodiment of a processor comprises: a decoder to decode an instruction specifying an operation, the instruction comprising a first operand identifying a multiplier and a second operand identifying a multiplicand; and fused multiply-add (FMA) execution circuitry comprising first multiplication circuitry to perform a multiplication using the multiplicand and multiplier to generate a result for multipliers and multiplicands falling within a first precision range, and second multiplication circuitry to be used instead of the first multiplication circuitry for multipliers and multiplicands falling within a second precision range.
Opening claim text (preview).
What is claimed is: 1. A processor comprising: a decoder to decode an instruction specifying an operation, the instruction comprising a first operand and a second operand; control circuitry, responsive to a precision of the first and second operands being at or above a threshold, to cause a first multiplication circuitry to process a first value and a second value indicated by the first operand and the second operand, respectively, to generate a result, the processing of the first and second values to generate the result including multiplication, and responsive to the precision of the first and second operands being below the threshold, to cause a second multiplication circuitry to process the first and second values indicated by the first operand and the second operand, respectively, to generate the result; and adder circuitry to add the result to an accumulated value to generate a new accumulated value. 2. The processor of claim 1 , further comprising fused circuitry including the first multiplication circuitry and the second multiplication circuitry to process the first and second values indicated by the first operand and the second operand. 3. The processor of claim 2 , the control circuitry comprising a second output selector to pass the second value to the first multiplication circuitry or second multiplication circuitry based on the precision of the first and second values relative to the threshold. 4. The processor of claim 1 , the control circuitry comprising a first output selector to pass the first value to the first multiplication circuitry or second multiplication circuitry based on the precision of the first and second values relative to the threshold. 5. The processor of claim 1 , the first multiplication circuitry comprising a booth and booth selectors to process the first and second values indicated by the first operand and the second operand, respectively, to generate the result. 6. The processor of claim 1 , the second multiplication circuitry to perform an inversion, zeroing, or identity operation to process the first and second values to generate the result. 7. The processor of claim 1 , the adder circuitry comprising a carry save adder (CSA) and multiple-bit final adder to generate the new accumulated value. 8. The processor of claim 1 , upon the first and second operands indicating binary or ternary values, the precision of the first and second operands being determined to be below the threshold. 9. The processor of claim 1 , further comprising a first register to store the first value indicated by the first operand and a second register to store the second value indicated by the second operand. 10. A method comprising: decoding, by a decoder, an instruction specifying an operation, the instruction comprising a first operand and a second operand; responsive to a precision of the first and second operands being at or above a threshold, causing, by control circuitry, a first multiplication circuitry to process a first value and a second value indicated by the first operand and the second operand, respectively, to generate a result, the processing of the first and second values to generate the result including multiplication, and responsive to the precision of the first and second operands being below the threshold, causing, by the control circuitry, a second multiplication circuitry to process the first and second values indicated by the first operand and the second operand, respectively, to generate the result; and adding, by adder circuitry, the result to an accumulated value to generate a new accumulated value. 11. The method of claim 10 , wherein the control circuitry is to perform: passing the first value to the first multiplication circuitry or second multiplication circuitry based on the precision of the first and second values relative to the threshold. 12. The method of claim 10 , wherein the control circuitry is to perform: passing the second value to the first multiplication circuitry or second multiplication circuitry based on the precision of the first and second values relative to the threshold. 13. The method of claim 10 , the second multiplication circuitry to process the first and second values indicated by the first operand and second operands comprises performing an inversion, zeroing, or identity operation to process the first and second values to generate the result. 14. A non-transitory machine-readable medium having program code stored thereon which, when executed by a machine, causes the machine to perform: decoding, by a decoder, an instruction specifying an operation, the instruction comprising a first operand and a second operand; responsive to a precision of the first and second operands being at or above a threshold, causing, by control circuitry, a first multiplication circuitry to process a first value and a second value indicated by the first operand and the second operand, respectively, to generate a result, the processing of the first and second values to generate the result including multiplication, and responsive to the precision of the first and second operands being below the threshold, causing, by the control circuitry, a second multiplication circuitry to process the first and second values indicated by the first operand and the second operand, respectively, to generate the result; and adding, by adder circuitry, the result to an accumulated value to generate a new accumulated value. 15. The non-transitory machine-readable medium of claim 14 , the second multiplication circuitry to process the first and second values indicated by the first operand and the second operand comprises performing an inversion, zeroing, or identity operation to process the first and second values to generate the result. 16. The non-transitory machine-readable medium of claim 14 , the adder circuitry comprising a carry save adder (CSA) and multiple-bit final adder to generate the new accumulated value. 17. The non-transitory machine-readable medium of claim 14 , upon the first and second operands indicating binary or ternary values, the precision of the first and second operands being determined to be below the threshold.
Convolutional networks [CNN, ConvNet] · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title
Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers {(G06F7/4806, G06F7/4824, G06F7/49, G06F7/491, G06F7/544 take precedence)} · CPC title
Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.