Method and apparatus to process 4-operand SIMD integer multiply-accumulate instruction

US9292297B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9292297-B2
Application numberUS-201213617021-A
CountryUS
Kind codeB2
Filing dateSep 14, 2012
Priority dateSep 14, 2012
Publication dateMar 22, 2016
Grant dateMar 22, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to one embodiment, a processor includes an instruction decoder to receive an instruction to process a multiply-accumulate operation, the instruction having a first operand, a second operand, a third operand, and a fourth operand. The first operand is to specify a first storage location to store an accumulated value; the second operand is to specify a second storage location to store a first value and a second value; and the third operand is to specify a third storage location to store a third value. The processor further includes an execution unit coupled to the instruction decoder to perform the multiply-accumulate operation to multiply the first value with the second value to generate a multiply result and to accumulate the multiply result and at least a portion of a third value to an accumulated value based on the fourth operand.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor, comprising: an instruction decoder to receive an instruction to process a multiply-accumulate operation, the instruction having a first operand, a second operand, a third operand, and a fourth operand, the first operand to specify a first storage location to store an accumulated value, the second operand to specify a second storage location to store a first value and a second value, the third operand to specify a third storage location to store a third value; and an execution unit coupled to the instruction decoder to perform the multiply-accumulate operation to multiply the first value with the second value to generate a multiply result and to accumulate the multiply result and at least a portion of the third value to the accumulated value based on the fourth operand, wherein the fourth operand to store a value indicating the at least a portion of the third value to be added to the accumulated value. 2. The processor of claim 1 , wherein a result of the multiply-accumulate operation is stored in the first storage location indicated by the first operand. 3. The processor of claim 1 , wherein a higher portion of the third value is accumulated when the value of the fourth storage location contains a first value, and wherein a lower portion of the third value is accumulated when the value of the fourth storage location contains a second value. 4. The processor of claim 1 , wherein the first, second, and third operands have at least 512 bits, and wherein the execution unit is to perform at least four iterations of the multiply-accumulate operation, each iteration occupying at least 128 bits. 5. The processor of claim 1 , wherein for a current iteration of multiply-accumulate operations, a multiplication is performed between the (i+63:i) bits of the second operand and the (i+127:i+64) bits of the second operand, a first addition is performed between the multiplication and the (i+63:i) bits of the first operand, and a second addition is performed between the first addition and the (i+63:i) bits of the third operand as specified by the fourth operand. 6. The processor of claim 5 , wherein a third addition is performed between a first set of carry bits resulting from the first addition and a second set of carry bits resulting from the second addition. 7. The processor of claim 5 , wherein the second addition is performed between the first addition and the (i+127:i+64) bits of the third operand as specified by the fourth operand. 8. A method, comprising: receiving, by an instruction decoder of a processor, an instruction having a first operand, a second operand, a third operand, and fourth operand, the first operand to specify a first storage location to store an accumulated value, the second operand to specify a second storage location to store a first value and a second value, the third operand to specify a third storage location to store a third value; and performing, by an execution unit of the processor, a multiply-accumulate operation to multiply the first value with the second value to generate a multiply result and to accumulate the multiply result and at least a portion of the third value to an accumulated value based on the fourth operand, wherein the fourth operand to store a value indicating the at least a portion of the third value to be added to the accumulated value. 9. The method of claim 8 , wherein a result of the multiply-accumulate operation is stored in the first storage location indicated by the first operand. 10. The method of claim 8 wherein a higher portion of the third value is accumulated when the value of the fourth storage location contains a first value, and wherein a lower portion of the third value is accumulated when the value of the fourth storage location contains a second value. 11. The method of claim 8 , wherein the first, second, and third operands have at least 512 bits, and wherein the execution unit is to perform at least four iterations of the multiply-accumulate operation, each iteration occupying at least 128 bits. 12. The method of claim 8 , wherein for a current iteration of multiply-accumulate operations, a multiplication is performed between the (i+63:i) bits of the second operand and the (i+127:i+64) bits of the second operand, a first addition is performed between the multiplication and the (i+63:i) bits of the first operand, and a second addition is performed between the first addition and the (i+63:i) bits of the third operand as specified by the fourth operand. 13. The method of claim 12 , wherein a third addition is performed between a first set of carry bits resulting from the first addition and a second set of carry bits resulting from the second addition. 14. The method of claim 12 , wherein the second addition is performed between the first addition and the (i+127:i+64) bits of the third operand as specified by the fourth operand. 15. A data processing system, comprising: an interconnect; a processor coupled to the interconnect to receive an instruction having a first operand, a second operand, a third operand, and fourth operand, the first operand to specify a first storage location to store an accumulated value, the second operand to specify a second storage location to store a first value and a second value, the third operand to specify a third storage location to store a third value, and the processor to perform a multiply-accumulate operation to multiply the first value with the second value to generate a multiply result and to accumulate the multiply result and at least a portion of the third value to an accumulated value based on the fourth operand, wherein the fourth operand to store a value indicating the at least a portion of the third value to be added to the accumulated value; and a dynamic random access (DRAM) coupled to the interconnect. 16. The system of claim 15 , wherein a result of the multiply-accumulate operation is stored in the first storage location indicated by the first operand. 17. The system of claim 15 , wherein a higher portion of the third value is accumulated when the value of the fourth storage location contains a first value, and wherein a lower portion of the third value is accumulated when the value of the fourth storage location contains a second value. 18. The system of claim 15 , wherein the first, second, and third operands have at least 512 bits, and wherein the execution unit is to perform at least four iterations of the multiply-accumulate operation, each iteration occupying at least 128 bits. 19. The system of claim 15 , wherein for a current iteration of multiply-accumulate operations, a multiplication is performed between the (i+63:i) bits of the second operand and the (i+127:i+64) bits of the second operand, a first addition is performed between the multiplication and the (i+63:i) bits of the first operand, and a second addition is performed between the first addition and the (i+63:i) bits of the third operand as specified by the fourth operand. 20. The system of claim 19 , wherein a third addition is performed between a first set of carry bits resulting from the first addition and a second set of carry bits resulting from the second addition. 21. The system of claim 19 , wherein the second addition is performed between the first addition and the (i+127:i+64) bits of the third operand as specified by the fourth operand.

Assignees

Inventors

Classifications

  • G06F9/3001Primary

    Arithmetic instructions · CPC title

  • G06F9/3887Primary

    controlled by a single instruction for multiple data lanes [SIMD] · CPC title

  • Instruction analysis, e.g. decoding, instruction word fields · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • using a mask · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9292297B2 cover?
According to one embodiment, a processor includes an instruction decoder to receive an instruction to process a multiply-accumulate operation, the instruction having a first operand, a second operand, a third operand, and a fourth operand. The first operand is to specify a first storage location to store an accumulated value; the second operand is to specify a second storage location to store a…
Who is the assignee on this patent?
Gopal Vinodh, Ozturk Erdinc, Guilford James D, and 2 more
What technology area does this patent fall under?
Primary CPC classification G06F9/3001. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 22 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).