Multiply-and-accumulate-products instructions

US10409592B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10409592-B2
Application numberUS-201715494946-A
CountryUS
Kind codeB2
Filing dateApr 24, 2017
Priority dateApr 24, 2017
Publication dateSep 10, 2019
Grant dateSep 10, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus has processing circuitry comprising an L×M multiplier array. An instruction decoder associated with the processing circuitry supports a multiply-and-accumulate-product (MAP) instruction for generating at least one result element corresponding to a sum of respective E×F products of E-bit and F-bit portions of J-bit and K-bit operands respectively, where 1<E<J≤L and 1<F<K≤M. In response to the MAP instruction, the instruction decoder controls the processing circuitry to rearrange F-bit portions of the second K-bit operand to form a transformed K-bit operand, and to control the L×M multiplier array in dependence on the first J-bit operand and the transformed K-bit operand to add the respective E×F products using a subset of the adders used for accumulating partial products for a conventional multiplication.

First claim

Opening claim text (preview).

We claim: 1. An apparatus comprising: processing circuitry to perform data processing, the processing circuitry comprising an L×M multiplier array, where L and M are integers; and an instruction decoder responsive to a multiply instruction specifying an L-bit operand and an M-bit operand to control the multiplier array to multiply the L-bit operand and the M-bit operand using a plurality of adders for accumulating partial products of the L-bit operand and the M-bit operand; wherein in response to a multiply-and-accumulate-products (MAP) instruction specifying a first J-bit operand and a second K-bit operand, where J≤L and K≤M, the instruction decoder is configured to control the processing circuitry to generate a result value comprising at least one result element, each result element corresponding to a sum of respective E×F products of an E-bit portion of the first J-bit operand and an F-bit portion of the second K-bit operand, where 1<E<J and 1<F<K; and in response to the MAP instruction, the instruction decoder is configured to control the processing circuitry to rearrange F-bit portions of the second K-bit operand to form a transformed K-bit operand, and to control the L×M multiplier array in dependence on the first J-bit operand and the transformed K-bit operand to add said respective E×F products using a subset of said plurality of adders, wherein said subset of said plurality of adders used for adding said respective E×F products in response to the MAP instruction comprise the same adders provided in hardware that are also used for accumulating partial products of the L-bit operand and the M-bit operand in response to the multiply instruction. 2. The apparatus according to claim 1 , comprising operand rearrangement circuitry to rearrange said F-bit portions of the second K-bit operand to form the transformed K-bit operand. 3. The apparatus according to claim 2 , wherein for at least one segment of the second K-bit operand comprising at least two of the F-bit portions, the operand rearrangement circuitry is configured to reverse an order of the F-bit portions within that segment to form a corresponding segment of the transformed K-bit operand. 4. The apparatus according to claim 2 , wherein the operand rearrangement circuitry is configured to rearrange the second K-bit operand according to one of plurality of different rearrangement patterns selected in dependence on a parameter of the MAP instruction. 5. The apparatus according to claim 1 , wherein in response to the MAP instruction, the instruction decoder is configured to control the processing circuitry to rearrange E-bit portions of the first J-bit operand to form a transformed J-bit operand, and to control the L×M multiplier array in dependence on the transformed J-bit operand and the transformed K-bit operand to add said respective E×F products using the subset of said plurality of adders. 6. The apparatus according to claim 1 , comprising partial product forming circuitry to generate the partial products to be accumulated by the plurality of adders of the L×M multiplier array. 7. The apparatus according to claim 6 , wherein in response to the MAP instruction, the instruction decoder is configured to control the partial product forming circuitry to generate the partial products in dependence on the first J-bit operand and the transformed K-bit operand. 8. The apparatus according to claim 6 , wherein in response to the MAP instruction, the instruction decoder is configured to control the partial product forming circuitry to set a subset of partial product bits of the partial products to zero irrespective of values of said first J-bit operand and said second K-bit operand. 9. The apparatus according to claim 8 , wherein the instruction decoder is configured to control the partial product forming circuitry to select which partial product bits are said subset of partial product bits in dependence on a parameter of the MAP instruction. 10. The apparatus according to claim 1 , wherein in response to at least one form of the MAP instruction, the instruction decoder is configured to control the processing circuitry to generate the result value comprising a plurality of result elements, each result element specifying a sum of the respective E×F products of the E-bit portions within an X-bit segment of the first J-bit operand with the F-bit portions within a Y-bit segment of the second K-bit operand, where E<X<J and F<Y<K. 11. The apparatus according to claim 10 , wherein in response to said at least one form of the MAP instruction, the instruction decoder is configured to control the L×M multiplier array to add the respective E×F products for a first X-bit segment of the first J-bit operand and a first Y-bit segment of the second K-bit operand using a first subset of said plurality of adders, and to add the respective E×F products for a second X-bit segment of the first J-bit operand and a second Y-bit segment of the second K-bit operand using a second subset of said plurality of adders. 12. The apparatus according to claim 1 , wherein the L×M multiplier array comprises a Wallace tree multiplier. 13. The apparatus according to claim 1 , comprising Booth encoding circuitry to encode one of said first J-bit operand and said second K-bit operand using Booth encoding. 14. The apparatus according to claim 13 , wherein in response to the MAP instruction, the Booth encoding circuitry is configured to encode the first J-bit operand using Booth encoding in parallel with operand rearrangement circuitry rearranging said F-bit portions of the second K-bit operand to form the transformed K-bit operand. 15. The apparatus according to claim 13 , wherein said plurality of adders comprises a number of adders sufficient to add at least N Z-bit partial products, where N is one of L and M and Z is the other of L and M; and the processing circuitry comprises additional partial product adding circuitry to add an additional P max partial products, where P is the number of respective E×F products to be added to form one result element of the result value, and P max is a maximum value for P supported by the processing circuitry; wherein in response to said MAP instruction, the instruction decoder is configured to control the processing circuitry to generate a result corresponding to a sum of said N Z-bit partial products and at least one of said additional P max partial products. 16. The apparatus according to claim 15 , wherein said additional partial product adding circuitry comprises further adders included in said L×M multiplier array, such that said plurality of adders comprises a number of adders sufficient to add at least (N+P max ) Z-bit partial products. 17. The apparatus according to claim 15 , wherein said additional partial product adding circuitry comprises circuitry separate from said L×M multiplier array to add said additional P max partial products to form a single value. 18. The apparatus according to claim 15 , wherein said additional partial product adding circuitry comprises: adding circuitry separate from said L×M multiplier array to reduce said additional P max partial products to R additional partial products, where 2≤R<P max , and further adders included in said L×M multiplier array, such that said plurality of adders comprises a number of adders sufficient to add at least (N+R) Z-bit partial products. 19. An apparatus comprising: means for performing data processing, comprising means for performing L×M multiplication; and an instruction decoder, responsive to a multiply instruction

Assignees

Inventors

Classifications

  • Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title

  • Methods or arrangements for processing data by operating upon the order or content of the data handled (logic circuits H03K19/00) · CPC title

  • G06F7/496Primary

    Multiplying; Dividing · CPC title

  • G06F9/3001Primary

    Arithmetic instructions · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10409592B2 cover?
An apparatus has processing circuitry comprising an L×M multiplier array. An instruction decoder associated with the processing circuitry supports a multiply-and-accumulate-product (MAP) instruction for generating at least one result element corresponding to a sum of respective E×F products of E-bit and F-bit portions of J-bit and K-bit operands respectively, where 1<E<J≤L and 1<F<K≤M. In respo…
Who is the assignee on this patent?
Advanced Risc Mach Ltd
What technology area does this patent fall under?
Primary CPC classification G06F7/496. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 10 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).