Apparatus and method for fused add-add instructions

US2016188341A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016188341-A1
Application numberUS-201414583050-A
CountryUS
Kind codeA1
Filing dateDec 24, 2014
Priority dateDec 24, 2014
Publication dateJun 30, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment of the invention, a processor including a storage location configured to store a set of source packed-data operands, each of the operands having a plurality of packed-data elements that are positive or negative according to an immediate bit value within one of the operands. The processor also including: a decoder to decode an instruction requiring an input of a plurality of source operands, and an execution unit to receive the decoded instructions and to generate a result that is a sum of the source operands. In one embodiment, the result is stored back into one of the source operands or the result is stored into an operand that is independent of the source operands.

First claim

Opening claim text (preview).

What is claimed is: 1 . A processor comprising: a first source register to store a first operand comprising a first plurality of packed data elements; a second source register to store a second operand comprising a second plurality of packed data elements; a third source register to store a third operand comprising a third plurality of packed data elements; and fused add-add circuitry to interpret the plurality of packed data elements as positive or negative in accordance with a corresponding value in a bit position within an immediate value, the fused add-add circuitry to add a corresponding data element of the first plurality to a first result data element comprising a sum of corresponding data elements of the second plurality and the third plurality, to generate a second result data element, the fused add-add circuitry to store the second result data element in a destination. 2 . The processor of claim 1 , wherein the fused add-add circuitry comprises a decode unit to decode a fused add-add instruction and an execution unit to execute the fused add-add instruction. 3 . The processor as in claim 2 , wherein the decode unit is to decode a single fused add-add instruction into a plurality of microoperations to be executed by the execution unit. 4 . The processor as in claim 3 , wherein the execution unit, having a plurality of sub-circuits, is to use the microoperations to interpret the plurality of packed data elements as positive or negative in accordance with a corresponding value in a bit position within an immediate value, add a corresponding data element of the first plurality to a first result data element comprising a sum of corresponding data elements of the second plurality and the third plurality, generating a second result data element, and to store the second result data element in a destination. 5 . The processor of claim 1 , wherein the first operand and the destination are a single register where the second result data element is stored. 6 . The processor of claim 1 , wherein the second result data element is written to the destination based on a value of a write-mask register of the processor. 7 . The processor of claim 1 , wherein to interpret the plurality of packed data elements as positive or negative, the fused add-add circuitry is to read a bit value in a first bit position of the immediate value corresponding to the first plurality of packed data elements to determine whether the first plurality of packed data elements are positive or negative, to read a bit value in a second bit position of the immediate value corresponding to the second plurality of packed data elements to determine whether the second plurality of packed data elements are positive or negative, and to read a bit value in a third bit position of the immediate value corresponding to the third plurality of packed data elements to determine whether the third plurality of packed data elements are positive or negative. 8 . The processor as in claim 7 , wherein the fused add-add circuitry is to further read a set of one or more bits other than the bits in the first, second, and third bit positions to determine a register or memory location of at least one of the operands. 9 . A method comprising: storing a first operand comprising a first plurality of packed data elements in a first source register; storing a second operand comprising a second plurality of packed data elements in a second source register; storing a third operand comprising a third plurality of packed data elements in a third source register; interpreting the plurality of packed data elements as positive or negative in accordance with a corresponding value in a bit position within an immediate value of an instruction; and adding a corresponding data element of the first plurality to a first result data element comprising a sum of corresponding data elements of the second plurality and the third plurality, generating a second result data element, and storing the second result data element in a destination. 10 . The method of claim 9 , further comprising: decoding by a decoder in a processor, the instruction specifying the first source register, the second source register, and the third source register; and executing by an execution unit in the processor the instruction by interpreting the plurality of packed data elements as positive or negative in accordance with the corresponding value in bit positions within the immediate value. 11 . The method as in claim 10 , wherein the decoder is to decode a single instruction into a plurality of microoperations to be executed by the execution unit. 12 . The method as in claim 11 , further comprising: interpreting using the microoperations by the execution unit, having a plurality of sub-circuits, the plurality of packed data elements as positive or negative in accordance with a corresponding value in a bit position within an immediate value, adding a corresponding data element of the first plurality to a first result data element comprising a sum of corresponding data elements of the second plurality and the third plurality, generating a second result data element, and storing the second result data element in a destination. 13 . The method of claim 9 , wherein the first operand and the destination are a single register where the second result data element is stored. 14 . The method of claim 9 , wherein the second result data element is written to the destination based on a value of a write-mask register of the processor. 15 . The method of claim 9 , further comprising: interpreting the plurality of packed data elements as positive or negative, by the fused add-add circuitry reading a bit value in a first bit position of the immediate value corresponding to the first plurality of packed data elements, to determine whether the first plurality of packed data elements are positive or negative, reading a bit value in a second bit position of the immediate value corresponding to the second plurality of packed data elements to determine whether the second plurality of packed data elements are positive or negative, and reading a bit value in a third bit position of the immediate value corresponding to the third plurality of packed data elements to determine whether the third plurality of packed data elements are positive or negative. 16 . The method as in claim 15 , further comprising: reading by the fused add-add circuitry a set of one or more bits other than the bits in the first, second, and third bit positions to determine a register or memory location of at least one of the operands. 17 . A system comprising: a memory unit coupled to a first storage location configured to store a first plurality of packed data elements; and a processor coupled to the memory unit, the processor comprising: a register file unit configured to store a plurality of packed data operands, including a first source register to store a first operand comprising a first plurality of packed data elements, a second source register to store a second operand comprising a second plurality of packed data elements, and a third source register to store a third operand comprising a third plurality of packed data elements; fused add-add circuitry to interpret the plurality of packed data elements as positive or negative in accordance with a corresponding value in a bit position within an immediate value, the fused add-add circuitry to add a corresponding data element of the first plurality to a first result data element comprising a sum of corresponding data elements of the second plu

Assignees

Inventors

Classifications

  • using a mask · CPC title

  • using decoder, e.g. decoder per instruction set, adaptable or programmable decoders · CPC title

  • of immediate specifier, e.g. constants · CPC title

  • with variable precision · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016188341A1 cover?
In one embodiment of the invention, a processor including a storage location configured to store a set of source packed-data operands, each of the operands having a plurality of packed-data elements that are positive or negative according to an immediate bit value within one of the operands. The processor also including: a decoder to decode an instruction requiring an input of a plurality of so…
Who is the assignee on this patent?
Ould-Ahmed-Vall Elmoustapha, Valentine Robert, Corbal Jesus, and 5 more
What technology area does this patent fall under?
Primary CPC classification G06F9/30196. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 30 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).