System and method of loop vectorization by compressing indexes and data elements from iterations based on a control mask
US-9740493-B2 · Aug 22, 2017 · US
US2016188530A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016188530-A1 |
| Application number | US-201414583644-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 27, 2014 |
| Priority date | Dec 27, 2014 |
| Publication date | Jun 30, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus and method for performing a vector permute. For example, one embodiment of a processor comprises: a source vector register to store a plurality of source data elements; a destination vector register to store a plurality of destination data elements; a control vector register to store a plurality of control data elements, each control data element corresponding to one of the destination data elements and including an N bit value indicating whether a source data element is to be copied to the corresponding destination data element; vector permute logic to compare the N bit value of each control data element to an N bit portion of an immediate to determine whether to copy a source data element to the corresponding destination data element, wherein if the N bit values match, then the vector permute logic is to identify a source data element using an index value included in the control data element and to responsively copy the source data element to the corresponding destination data element in the destination vector register.
Opening claim text (preview).
What is claimed is: 1 . A processor comprising: a source vector register to store a plurality of source data elements; a destination vector register to store a plurality of destination data elements; a control vector register to store a plurality of control data elements, each control data element corresponding to one of the destination data elements and including an N bit value indicating whether a source data element is to be copied to the corresponding destination data element; vector permute logic to compare the N bit value of each control data element to an N bit portion of an immediate to determine whether to copy a source data element to the corresponding destination data element, wherein if the N bit values match, then the vector permute logic is to identify a source data element using an index value included in the control data element and to responsively copy the source data element to the corresponding destination data element in the destination vector register. 2 . The processor as in claim 1 wherein the N bit portion of the immediate comprises the least significant N bit portion of the immediate and wherein the N bit portion of the control data element comprises the most significant N bit portion of the control data element. 3 . The processor as in claim 2 wherein N=2. 4 . The processor as in claim 1 wherein each of the source data elements, destination data elements, and control data elements comprise bytes. 5 . The processor as in claim 4 wherein 6 bits of each control byte are used for the index value to index one of 64 source bytes in the source vector register. 6 . The processor as in claim 5 wherein 2 bits of each control byte are to be compared with a 2 bit portion of the immediate to determine whether to copy a source data element to the corresponding destination data element. 7 . The processor as in claim 1 wherein each of the source data elements, destination data elements, and control data elements comprise words. 8 . The processor as in claim 1 wherein the vector permute logic is to execute one or more vector permute instructions to perform its operations. 9 . The processor as in claim 8 wherein the vector permute logic is to execute a plurality of vector permute instructions, each having a different immediate value and a different source vector register to permute values from the different source vector registers into the destination vector register. 10 . The processor as in claim 1 further comprising: a mask register to store a mask value wherein the vector permute logic is to perform write masking on data elements copied to the destination vector register using the mask value. 11 . A method comprising: storing a plurality of source data elements in a source vector register; storing a plurality of destination data elements in a destination vector register; storing a plurality of control data elements in a control vector register, each control data element corresponding to one of the destination data elements and including an N bit value indicating whether a source data element is to be copied to the corresponding destination data element; comparing the N bit value of each control data element to an N bit portion of an immediate to determine whether to copy a source data element to the corresponding destination data element, wherein if the N bit values match, then identifying a source data element using an index value included in the control data element responsively copying the source data element to the corresponding destination data element in the destination vector register. 12 . The method as in claim 11 wherein the N bit portion of the immediate comprises the least significant N bit portion of the immediate and wherein the N bit portion of the control data element comprises the most significant N bit portion of the control data element. 13 . The method as in claim 12 wherein N=2. 14 . The method as in claim 11 wherein each of the source data elements, destination data elements, and control data elements comprise bytes. 15 . The method as in claim 14 wherein 6 bits of each control byte are used for the index value to index one of 64 source bytes in the source vector register. 16 . The method as in claim 15 wherein 2 bits of each control byte are to be compared with a 2 bit portion of the immediate to determine whether to copy a source data element to the corresponding destination data element. 17 . The method as in claim 11 wherein each of the source data elements, destination data elements, and control data elements comprise words. 18 . The method as in claim 11 wherein one or more vector permute instructions are executed to perform the recited operations. 19 . The method as in claim 18 wherein a plurality of vector permute instructions are executed, each having a different immediate value and a different source vector register to permute values from the different source vector registers into the destination vector register. 20 . The method as in claim 1 further comprising: storing a mask value in a mask register and performing write masking on data elements copied to the destination vector register using the mask value. 21 . A system comprising: a memory to store data and instructions, including a vector permute instruction and data; a plurality of cores to execute the instructions and process the data; a graphics processor to perform graphics operations in response to certain instructions; a network interface for receiving and transmitting data over a network; a user input interface for receiving user input from a mouse or cursor control device; and a source vector register in one of more of the cores to store a plurality of source data elements; a destination vector register in one or more of the cores to store a plurality of destination data elements; a control vector register in one or more of the cores to store a plurality of control data elements, each control data element corresponding to one of the destination data elements and including an N bit value indicating whether a source data element is to be copied to the corresponding destination data element; vector permute logic in one or more of the cores to compare the N bit value of each control data element to an N bit portion of an immediate to determine whether to copy a source data element to the corresponding destination data element, wherein if the N bit values match, then the vector permute logic is to identify a source data element using an index value included in the control data element and to responsively copy the source data element to the corresponding destination data element in the destination vector register. 22 . The system as in claim 21 wherein the N bit portion of the immediate comprises the least significant N bit portion of the immediate and wherein the N bit portion of the control data element comprises the most significant N bit portion of the control data element. 23 . The system as in claim 22 wherein N=2. 24 . The system as in claim 21 wherein each of the source data elements, destination data elements, and control data elements comprise bytes.
Special arrangements thereof, e.g. mask or switch · CPC title
Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title
Vector processors · CPC title
using directory or table look-up (use of a directory or look-up table in file systems G06F16/13) · CPC title
Lookup · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.