Method and apparatus for efficient binary and ternary support in fused multiply-add (FMA) circuits

US11836464B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11836464-B2
Application numberUS-202217839905-A
CountryUS
Kind codeB2
Filing dateJun 14, 2022
Priority dateOct 15, 2018
Publication dateDec 5, 2023
Grant dateDec 5, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus and method for efficiently performing a multiply add or multiply accumulate operation. For example, one embodiment of a processor comprises: a decoder to decode an instruction specifying an operation, the instruction comprising a first operand identifying a multiplier and a second operand identifying a multiplicand; and fused multiply-add (FMA) execution circuitry comprising first multiplication circuitry to perform a multiplication using the multiplicand and multiplier to generate a result for multipliers and multiplicands falling within a first precision range, and second multiplication circuitry to be used instead of the first multiplication circuitry for multipliers and multiplicands falling within a second precision range.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: a decoder to decode an instruction specifying an operation, the instruction comprising a first operand and a second operand; control circuitry, responsive to a precision of the first and second operands being at or above a threshold, to cause a first multiplication circuitry to process a first value and a second value indicated by the first operand and the second operand, respectively, to generate a result, the processing of the first and second values to generate the result including multiplication, and responsive to the precision of the first and second operands being below the threshold, to cause a second multiplication circuitry to process the first and second values indicated by the first operand and the second operand, respectively, to generate the result; and adder circuitry to add the result to an accumulated value to generate a new accumulated value. 2. The processor of claim 1 , further comprising fused circuitry including the first multiplication circuitry and the second multiplication circuitry to process the first and second values indicated by the first operand and the second operand. 3. The processor of claim 2 , the control circuitry comprising a second output selector to pass the second value to the first multiplication circuitry or second multiplication circuitry based on the precision of the first and second values relative to the threshold. 4. The processor of claim 1 , the control circuitry comprising a first output selector to pass the first value to the first multiplication circuitry or second multiplication circuitry based on the precision of the first and second values relative to the threshold. 5. The processor of claim 1 , the first multiplication circuitry comprising a booth and booth selectors to process the first and second values indicated by the first operand and the second operand, respectively, to generate the result. 6. The processor of claim 1 , the second multiplication circuitry to perform an inversion, zeroing, or identity operation to process the first and second values to generate the result. 7. The processor of claim 1 , the adder circuitry comprising a carry save adder (CSA) and multiple-bit final adder to generate the new accumulated value. 8. The processor of claim 1 , upon the first and second operands indicating binary or ternary values, the precision of the first and second operands being determined to be below the threshold. 9. The processor of claim 1 , further comprising a first register to store the first value indicated by the first operand and a second register to store the second value indicated by the second operand. 10. A method comprising: decoding, by a decoder, an instruction specifying an operation, the instruction comprising a first operand and a second operand; responsive to a precision of the first and second operands being at or above a threshold, causing, by control circuitry, a first multiplication circuitry to process a first value and a second value indicated by the first operand and the second operand, respectively, to generate a result, the processing of the first and second values to generate the result including multiplication, and responsive to the precision of the first and second operands being below the threshold, causing, by the control circuitry, a second multiplication circuitry to process the first and second values indicated by the first operand and the second operand, respectively, to generate the result; and adding, by adder circuitry, the result to an accumulated value to generate a new accumulated value. 11. The method of claim 10 , wherein the control circuitry is to perform: passing the first value to the first multiplication circuitry or second multiplication circuitry based on the precision of the first and second values relative to the threshold. 12. The method of claim 10 , wherein the control circuitry is to perform: passing the second value to the first multiplication circuitry or second multiplication circuitry based on the precision of the first and second values relative to the threshold. 13. The method of claim 10 , the second multiplication circuitry to process the first and second values indicated by the first operand and second operands comprises performing an inversion, zeroing, or identity operation to process the first and second values to generate the result. 14. A non-transitory machine-readable medium having program code stored thereon which, when executed by a machine, causes the machine to perform: decoding, by a decoder, an instruction specifying an operation, the instruction comprising a first operand and a second operand; responsive to a precision of the first and second operands being at or above a threshold, causing, by control circuitry, a first multiplication circuitry to process a first value and a second value indicated by the first operand and the second operand, respectively, to generate a result, the processing of the first and second values to generate the result including multiplication, and responsive to the precision of the first and second operands being below the threshold, causing, by the control circuitry, a second multiplication circuitry to process the first and second values indicated by the first operand and the second operand, respectively, to generate the result; and adding, by adder circuitry, the result to an accumulated value to generate a new accumulated value. 15. The non-transitory machine-readable medium of claim 14 , the second multiplication circuitry to process the first and second values indicated by the first operand and the second operand comprises performing an inversion, zeroing, or identity operation to process the first and second values to generate the result. 16. The non-transitory machine-readable medium of claim 14 , the adder circuitry comprising a carry save adder (CSA) and multiple-bit final adder to generate the new accumulated value. 17. The non-transitory machine-readable medium of claim 14 , upon the first and second operands indicating binary or ternary values, the precision of the first and second operands being determined to be below the threshold.

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • G06F7/5443Primary

    Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title

  • Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers {(G06F7/4806, G06F7/4824, G06F7/49, G06F7/491, G06F7/544 take precedence)} · CPC title

  • Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11836464B2 cover?
An apparatus and method for efficiently performing a multiply add or multiply accumulate operation. For example, one embodiment of a processor comprises: a decoder to decode an instruction specifying an operation, the instruction comprising a first operand identifying a multiplier and a second operand identifying a multiplicand; and fused multiply-add (FMA) execution circuitry comprising first …
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F7/5443. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 05 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).