Systems, apparatuses, and methods for performing a butterfly horizontal and cross add or substract in response to a single instruction

US9459865B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9459865-B2
Application numberUS-201113992236-A
CountryUS
Kind codeB2
Filing dateDec 23, 2011
Priority dateDec 23, 2011
Publication dateOct 4, 2016
Grant dateOct 4, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed butterfly horizontal cross add or subtract of packed data elements in response to a single vector packed butterfly horizontal cross add or subtract instruction that includes a destination vector register operand, a source vector register operand, an immediate, and an opcode are described.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: executing a single instruction to, for each data lane of a source vector register, calculate a horizontal and cross addition or subtraction between packed data elements of the source vector register where the decision of whether to add or subtract is dependent upon a bit in an immediate value of the instruction, wherein the instruction includes a destination vector register operand, the source vector register operand, the immediate, and an opcode; and storing a result of each a horizontal and cross addition or subtraction between packed data elements into the destination register, wherein for the least significant data element position of the destination register, the result stored is the least significant data element of the source register being added to or subtracted from the second most least significant data element of the source register wherein the decision of addition or subtraction is based on the least significant bit of the immediate; wherein for the second most least significant data element position of the destination register, the result stored is the result of the third most least significant data element of the source register being added to or subtracted from the fourth most least significant data element of the source register wherein the decision of addition or subtraction is based on the third most least significant bit of the immediate; wherein for the third most least significant data element position of the destination register, the result stored is the result of the second most least significant data element of the source register being added to or subtracted from the least significant data element of the source register wherein the decision of addition or subtraction is based on the second most least significant bit of the immediate; and wherein for the fourth most significant data element position of the destination register, the result stored is the result of the fourth most least significant data element of the source register being added to or subtracted from the third most least significant data element of the source register wherein the decision of addition or subtraction is based on the fourth most least significant bit of the immediate. 2. The method of claim 1 , wherein there are a plurality of data lanes. 3. The method of claim 1 , wherein a number of data lanes to be processed is dependent upon the size of the destination vector register. 4. The method of claim 1 , wherein the source and destination vector registers are 128-bit, 256-bit, or 512-bit in size. 5. The method of claim 1 , wherein the packed data elements of the source and destination registers are 8-bit, 16-bit, 32-bit, or 64-bit in size. 6. The method of claim 5 , wherein the size of the packed data elements of the source is defined by the opcode. 7. The method of claim 1 , wherein the immediate is an 8-bit value. 8. An article of manufacture comprising: a non-transitory machine-readable storage medium having stored thereon an occurrence of a single instruction, wherein the instruction's format specifies as its source operands a vector register and an immediate and specifies as its destination a single destination vector register, and wherein the instruction format includes an opcode which instructs a machine, responsive to the single occurrence of the single instruction, to cause for each data lane of the source vector register, a calculation of a horizontal and cross addition or subtraction between packed data elements of the source vector register where the decision of whether to add or subtract is dependent upon a bit in the immediate value of the instruction, and storage of a result of each a horizontal and cross addition or subtraction between packed data elements results into the destination register, wherein for the least significant data element position of the destination register, the result stored is the least significant data element of the source register being added to or subtracted from the second most least significant data element of the source register wherein the decision of addition or subtraction is based on the least significant bit of the immediate; wherein for the second most least significant data element position of the destination register, the result stored is the result of the third most least significant data element of the source register being added to or subtracted from the fourth most least significant data element of the source register wherein the decision of addition or subtraction is based on the third most least significant bit of the immediate; wherein for the third most least significant data element position of the destination register, the result stored is the result of the second most least significant data element of the source register being added to or subtracted from the least significant data element of the source register wherein the decision of addition or subtraction is based on the second most least significant bit of the immediate; and wherein for the fourth most significant data element position of the destination register, the result stored is the result of the fourth most least significant data element of the source register being added to or subtracted from the third most least significant data element of the source register wherein the decision of addition or subtraction is based on the fourth most least significant bit of the immediate. 9. The article of manufacture of claim 8 , wherein there are a plurality of data lanes. 10. The article of manufacture of claim 8 , wherein a number of data lanes to be processed is dependent upon the size of the destination vector register. 11. The article of manufacture of claim 8 , wherein the source and destination vector registers are 128-bit, 256-bit, or 512-bit in size. 12. The article of manufacture of claim 8 , wherein the packed data elements of the source and destination registers are 8-bit, 16-bit, 32-bit, or 64-bit in size. 13. The article of manufacture of claim 8 , wherein the size of the packed data elements of the source is defined by the opcode. 14. The article of manufacture of claim 8 , wherein the immediate is an 8-bit value. 15. An apparatus comprising; a hardware decoder to decode a single instruction that includes a destination vector register operand, a source vector register operand, an immediate, and an opcode; execution circuitry to, for each data lane of the source vector register, calculate a horizontal and cross addition or subtraction between packed data elements of the source vector register where the decision of whether to add or subtract is dependent upon a bit in the immediate value of the instruction, and store a result for each a horizontal and cross addition or subtraction between packed data elements results into the destination register, wherein for the least significant data element position of the destination register, the result stored is the least significant data element of the source register being added to or subtracted from the second most least significant data element of the source register wherein the decision of addition or subtraction is based on the least significant bit of the immediate; wherein for the second most least significant data element position of the destination register, the result stored is the result of the third most least significant data element of the source register being added to or subtracted from the fourth most least significant data element of the source register wherein the decision of addition or subtraction is based on the third most least significant bit of the immediate; wherein for the third most least

Assignees

Inventors

Classifications

  • Instruction analysis, e.g. decoding, instruction word fields · CPC title

  • Bit or string instructions · CPC title

  • with variable precision · CPC title

  • Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9459865B2 cover?
Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed butterfly horizontal cross add or subtract of packed data elements in response to a single vector packed butterfly horizontal cross add or subtract instruction that includes a destination vector register operand, a source vector register operand, an immediate, and an opcode are described.
Who is the assignee on this patent?
Ould-Ahmed-Vall Elmoustapha, Hagog Mostafa, Valentine Robert, and 4 more
What technology area does this patent fall under?
Primary CPC classification G06F9/3001. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 04 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).