Systems, apparatuses, and methods for performing a horizontal add or subtract in response to a single instruction

US9619226B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9619226-B2
Application numberUS-201113992230-A
CountryUS
Kind codeB2
Filing dateDec 23, 2011
Priority dateDec 23, 2011
Publication dateApr 11, 2017
Grant dateApr 11, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed horizontal add or subtract of packed data elements in response to a single vector packed horizontal add or subtract instruction that includes a destination vector register operand, a source vector register operand, and an opcode are describes.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of performing in a computer processor vector packed horizontal add or subtract of packed data elements in response to a single vector packed horizontal add or subtract instruction that includes a destination vector register operand, a source vector register operand, an immediate, and an opcode, wherein the source vector register comprises a plurality of packed data elements divided into a plurality of data lanes, each data lane corresponds to a destination data element in the destination vector register, the immediate comprises at least a same number of active bits as there are packed data elements in each data lane, and each active bit of the immediate corresponds to one of the plurality of packed data elements in each data lane, the method comprising: executing the single vector packed horizontal add or subtract instruction to, for each data lane of the source vector register, read a value of each active bit position of the immediate to determine whether to negate a value of corresponding data element position of the data lane, responsively negate the values determined to be negated, and sum all negated and unchanged packed data elements in each data lane to create a data lane result; and storing each data lane result in a corresponding destination data element position of the destination register. 2. The method of claim 1 , wherein each data lane of the source vector register has four packed data elements. 3. The method of claim 1 , wherein a number of data lanes to be processed is dependent upon size of the destination vector register. 4. The method of claim 1 , wherein the source and destination vector registers are 128-bit, 256-bit, or 512-bit in size. 5. The method of claim 1 , wherein the packed data elements of the source vector register and the destination data elements of the destination vector register are 8-bit, 16-bit, 32-bit, or 64-bit in size. 6. The method of claim 5 , wherein the size of the packed data elements of the source vector register is defined by the opcode. 7. The method of claim 1 , wherein the immediate is an 8-bit value. 8. An article of manufacture comprising: a non-transitory tangible machine-readable storage medium having stored thereon an occurrence of an instruction, wherein the instruction's format specifies as source operands a vector register and an immediate and specifies as destination a single destination vector register, the source vector register comprises a plurality of packed data elements divided into a plurality of data lanes, each data lane corresponds to a destination data element in the destination vector register, the immediate comprises at least a same number of active bits as there are packed data elements in each data lane, each active bit of the immediate corresponds to one of the plurality of packed data elements in each data lane, and the instruction format includes an opcode which instructs a machine, responsive to a single occurrence of the instruction, to cause for each data lane of the source vector register, a reading of a value of each active bit position of the immediate to determine whether to negate a value of corresponding data element position of the data lane, responsively negate the values determined to be negated, and sum all negated and unchanged packed data elements in each data lane to create a data lane result, and store each data lane result in a corresponding destination data element position of the destination register. 9. The article of manufacture of claim 8 , wherein each data lane of the source vector register has four packed data elements. 10. The article of manufacture of claim 8 , wherein a number of data lanes to be processed is dependent upon size of the data elements of the destination vector register. 11. The article of manufacture of claim 8 , wherein the source and destination vector registers are 128-bit, 256-bit, or 512-bit in size. 12. The article of manufacture of claim 8 , wherein the packed data elements of the source vector register and destination data elements of the destination vector registers are 8-bit, 16-bit, 32-bit, or 64-bit in size. 13. The article of manufacture of claim 12 , wherein the size of the packed data elements of the source vector registers is defined by the opcode. 14. The article of manufacture of claim 8 , wherein the immediate is an 8-bit value. 15. An apparatus comprising; a hardware decoder to decode a single instruction that includes a destination vector register operand, a source vector register operand, an immediate, and an opcode, wherein the source vector register comprises a plurality of packed data elements divided into a plurality of data lanes, each data lane corresponds to a destination data element in the destination vector register, the immediate comprises at least a same number of active bits as there are packed data elements in each data lane, and each active bit of the immediate corresponds to one of the plurality of packed data elements in each data lane; and execution circuitry to execute the decoded instruction to, for each data lane of the source vector register, read a value of each active bit position of the immediate to determine whether to negate a value of corresponding data element position of the data lane, responsively negate the values determined to be negated, and sum all negated and unchanged packed data elements in each data lane to create a data lane result, and store each data lane result in a corresponding destination data element position of the destination register. 16. The apparatus of claim 15 , wherein each data lane of the source vector register has four packed data elements. 17. The apparatus of claim 15 , wherein a number of data lanes to be processed is dependent upon size of the destination vector register. 18. The apparatus of claim 15 , wherein the source and destination vector registers are 128-bit, 256-bit, or 512-bit in size. 19. The apparatus of claim 15 , wherein the packed data elements of the source vector register and destination data elements of the destination vector register are 8-bit, 16-bit, 32-bit, or 64-bit in size. 20. The apparatus of claim 15 , wherein the immediate is an 8-bit value.

Assignees

Inventors

Classifications

  • Arithmetic instructions · CPC title

  • Vector processors · CPC title

  • with variable precision · CPC title

  • Logical and Boolean instructions, e.g. XOR, NOT · CPC title

  • according to one or more bits in the instruction, e.g. prefix, sub-opcode · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9619226B2 cover?
Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed horizontal add or subtract of packed data elements in response to a single vector packed horizontal add or subtract instruction that includes a destination vector register operand, a source vector register operand, and an opcode are describes.
Who is the assignee on this patent?
Hagog Mostafa, Ould-Ahmed-Vall Elmoustapha, Valentine Robert, and 4 more
What technology area does this patent fall under?
Primary CPC classification G06F9/30036. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 11 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).