Method and apparatus for performing a vector permute with an index and an immediate

US10445092B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10445092-B2
Application numberUS-201414583644-A
CountryUS
Kind codeB2
Filing dateDec 27, 2014
Priority dateDec 27, 2014
Publication dateOct 15, 2019
Grant dateOct 15, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processor for performing a vector permute comprises: a source vector register to store a plurality of source data elements; a destination vector register to store a plurality of destination data elements; a control vector register to store a plurality of control data elements, each control data element corresponding to one of the destination data elements and including an N bit value indicating whether a source data element is to be copied to the corresponding destination data element; vector permute logic to compare the N bit value of each control data element to an N bit portion of an immediate to determine whether to copy a source data element to the corresponding destination data element, wherein if the N bit values match, then the vector permute logic is to identify a source data element using an index value included in the control data element.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: a plurality of source vector registers to each store a plurality of source data elements; a destination vector register to store a plurality of destination data elements from the plurality of source vector registers; a control vector register to store a plurality of control data elements, each control data element corresponding to one of the destination data elements and including an N bit value indicating whether a source data element is to be copied to the corresponding destination data element; vector permute logic to execute a plurality of vector permute instructions, each corresponding to one of the plurality of source vector registers, each of the plurality of vector permute instructions having a different immediate, each of the plurality of vector permute instructions to compare the N bit value of each control data element to a value stored in an N bit portion of that vector permute instruction's immediate to determine whether to copy a source data element from the corresponding source vector register to the corresponding destination data element, wherein the N bit portion of the immediate comprises a least significant N bit portion of the immediate and wherein the N bit value of each control data element is stored in a most significant N bit portion of each control data element, wherein each vector permute instruction swaps a position of groups of bits stored in one of the plurality of source vector registers and stores the result in the destination vector register, wherein the immediate specifies a size of the groups of bits to be swapped, the size of the groups of bits limited to being a power of two such that all groups of bits have a pair with which to swap, wherein if the value stored in the N bit portion of that vector permute instruction's immediate and the N bit value stored in the most significant N bit portion of a control data element match, then after determining the value stored in the N bit portion and N bit value stored in the most significant N bit portion of the control data element match the vector permute logic is to identify a source data element in the source vector register corresponding to that vector permute instruction using an index value included in the control data element and to responsively copy the source data element to the corresponding destination data element in the destination vector register, wherein for a destination data element which corresponds to a control data element having an N bit value which is not equal to the value stored in the N bit portion of the immediate, the vector permute logic is to leave a current value in the destination data element unmodified. 2. The processor as in claim 1 wherein N=2. 3. The processor as in claim 1 wherein each of the source data elements, destination data elements, and control data elements comprise bytes. 4. The processor as in claim 3 wherein 6 bits of each control byte are used for the index value to index one of 64 source bytes in the source vector register. 5. The processor as in claim 4 wherein 2 bits of each control byte are to be compared with a 2 bit portion of the immediate to determine whether to copy a source data element to the corresponding destination data element. 6. The processor as in claim 1 wherein each of the source data elements, destination data elements, and control data elements comprise words. 7. The processor as in claim 1 wherein the vector permute logic is to execute one or more vector permute instructions to perform its operations. 8. The processor as in claim 7 wherein the vector permute logic is to execute a plurality of vector permute instructions, each having a different immediate value and a different source vector register to permute values from the different source vector registers into the destination vector register. 9. The processor as in claim 1 further comprising: a mask register to store a mask value, wherein the vector permute logic is to perform write masking on data elements copied to the destination vector register using the mask value. 10. A method comprising: storing a plurality of source data elements in a plurality of source vector registers; storing a plurality of destination data elements in a destination vector register, the plurality of destination data elements selected from the plurality of source vector registers; storing a plurality of control data elements in a control vector register, each control data element corresponding to one of the destination data elements and including an N bit value indicating whether a source data element is to be copied to the corresponding destination data element; executing a plurality of vector permute instructions, each corresponding to one of the plurality of source vector registers, each of the plurality of vector permute instructions having a different immediate, each of the plurality of vector permute instructions to compare the N bit value of each control data element to a value stored in an N bit portion of that vector permute instruction's immediate to determine whether to copy a source data element from the corresponding source vector register to the corresponding destination data element, wherein the N bit portion of the immediate comprises a least significant N bit portion of the immediate and wherein the N bit value of each control data element is stored in a most significant N bit portion of each control data element, wherein each vector permute instruction swaps a position of groups of bits stored in one of the plurality of source vector registers and stores the result in the destination vector register, wherein the immediate specifies a size of the groups of bits to be swapped, the size of the groups of bits limited to being a power of two such that all groups of bits have a pair with which to swap, wherein if the value stored in the N bit portion of that vector permute instruction's immediate and the N bit value stored in the most significant N bit portion of a control data element match, then after determining the value stored in the N bit portion and N bit value stored in the most significant N bit portion of the control data element match the vector permute logic is to identify a source data element in the source vector register corresponding to that vector permute instruction using an index value included in the control data element and to responsively copy the source data element to the corresponding destination data element in the destination vector register, wherein for a destination data element which corresponds to a control data element having an N bit value which is not equal to the value stored in the N bit portion of the immediate, the vector permute logic is to leave a current value in the destination data element unmodified. 11. The method as in claim 10 wherein N=2. 12. The method as in claim 10 wherein each of the source data elements, destination data elements, and control data elements comprise bytes. 13. The method as in claim 12 wherein 6 bits of each control byte are used for the index value to index one of 64 source bytes in the source vector register. 14. The method as in claim 13 wherein 2 bits of each control byte are to be compared with a 2 bit portion of the immediate to determine whether to copy a source data element to the corresponding destination data element. 15. The method as in claim 10 wherein each of the source data elements, destination data elements, and control data elements comprise words. 16. The method as in claim 10 wherein one or more vector permute instructions are executed to perform the recited ope

Assignees

Inventors

Classifications

  • Special arrangements thereof, e.g. mask or switch · CPC title

  • Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title

  • Organisation of register space, e.g. banked or distributed register file · CPC title

  • Vector processors · CPC title

  • using directory or table look-up (use of a directory or look-up table in file systems G06F16/13) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10445092B2 cover?
A processor for performing a vector permute comprises: a source vector register to store a plurality of source data elements; a destination vector register to store a plurality of destination data elements; a control vector register to store a plurality of control data elements, each control data element corresponding to one of the destination data elements and including an N bit value indicati…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F15/8084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 15 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).