Method and apparatus for shuffling data
US-9477472-B2 · Oct 25, 2016 · US
US10713044B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10713044-B2 |
| Application number | US-201515508284-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 4, 2015 |
| Priority date | Sep 25, 2014 |
| Publication date | Jul 14, 2020 |
| Grant date | Jul 14, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processor includes packed data registers and a decode unit to decode an instruction. The instruction is to indicate a first source operand having at least one lane of bits, and a second source packed data operand having a number of sub-lane sized bit selection elements. An execution unit is coupled with the packed data registers and the decode unit. The execution unit, in response to the instruction, stores a result operand in a destination storage location. The result operand includes, a different corresponding bit for each of the number of sub-lane sized bit selection elements. A value of each bit of the result operand corresponding to a sub-lane sized bit selection element is that of a bit of a corresponding lane of bits, of the at least one lane of bits of the first source operand, which is indicated by the corresponding sub-lane sized bit selection element.
Opening claim text (preview).
What is claimed is: 1. A processor comprising: a plurality of packed data registers; a decode unit to decode an instruction, the instruction to indicate a first source operand that is to have at least one lane of bits, and the instruction to indicate a packed data register that is to store a second source packed data operand that is to have a number of sub-lane sized bit selection elements; and an execution unit coupled with the packed data registers and the decode unit, the execution unit, in response to the instruction, to store a result operand in a destination storage location that is to be indicated by the instruction, the result operand to include, a different corresponding single bit for each of the number of sub-lane sized bit selection elements, a value of each single bit of the result operand corresponding to a sub-lane sized bit selection element to be that of a single bit of a corresponding lane of bits, of the at least one lane of bits of the first source operand, which is indicated by the corresponding sub-lane sized bit selection element, wherein the result operand is to include a plurality of the single bits for each of the at least one lane of bits, and wherein the plurality of the single bits for each of the at least one lane of bits are to be stored in adjacent bit positions. 2. The processor of claim 1 , wherein the number of sub-lane sized bit selection elements include a plurality of subsets that each correspond to a different one of a plurality of lanes of bits, and wherein the execution unit, in response to the instruction, is to use each subset of the sub-lane sized bit selection elements to select bits from within only a corresponding lane of bits. 3. The processor of claim 2 , wherein the execution unit, in response to the instruction, is to store the result operand in a packed data register having the plurality of lanes of bits. 4. The processor of claim 3 , wherein the execution unit, in response to the instruction, is to store the bits selected by each subset of the sub-lane sized bit selection elements in a corresponding lane of bits of the packed data register. 5. The processor of claim 4 , wherein the execution unit, in response to the instruction, is to store at least one replica of the bits selected by each subset of the sub-lane sized bit selection elements in the corresponding lane of bits of the packed data register. 6. The processor of claim 5 , wherein the decode unit is to decode the instruction that is to indicate a source predicate mask operand, and wherein the execution unit, in response to the instruction, is to use the source predicate mask operand to predicate storage of the bits selected by each subset of the sub-lane sized bit selection elements and replicas thereof in the corresponding lane of bits of the packed data register. 7. The processor of claim 1 , wherein each sub-lane sized bit selection element corresponds to a single bit of the result operand in a same relative position, and wherein the second source packed data operand has at least sixteen sub-lane sized bit selection elements. 8. The processor of claim 1 , wherein the execution unit, in response to the instruction, is to store the result operand in the destination storage location which is a packed data operation mask register. 9. The processor of claim 1 , wherein the execution unit, in response to the instruction, is to store the result operand in the destination storage location which is a general-purpose register. 10. The processor of claim 1 , wherein the decode unit is to decode the instruction that is to indicate the first source operand that is to have a single lane of bits, wherein all of the number of sub-lane sized bit selection elements are to correspond to the single lane of bits, and wherein the execution unit, in response to the instruction, is to store a single bit of the single lane of bits to the result operand for each of the number of sub-lane sized bit selection elements. 11. The processor of claim 1 , wherein the decode unit is to decode the instruction that is to indicate the first source operand is to have a plurality of lanes of bits. 12. The processor of claim 1 , wherein the decode unit is to decode the instruction that is to indicate the first source operand that is to have a single lane of bits, and wherein the processor, in response to the instruction, is to replicate the single lane of bits of the first source operand a plurality of times to create a plurality of lanes of bits. 13. The processor of claim 1 , wherein the decode unit is to decode the instruction that is to indicate the first source operand that is to have at least one 64-bit lane of bits, and is to indicate the second source packed data operand that is to have the number of at least 6-bit sized bit selection elements. 14. The processor of claim 13 , wherein each at least 6-bit bit selection element is in a different corresponding 8-bit byte of the second source packed data operand, and wherein the second source packed data operand has at least sixteen bit selection elements. 15. A processor comprising: a plurality of packed data registers; a decode unit to decode an instruction, the instruction to indicate a first source operand that is to have at least one lane of bits, and the instruction to indicate a packed data register that is to store a second source packed data operand that is to have a same number of sub-lane sized bit selection elements as a number of bits in each of the at least one lane of bits of the first source operand; and an execution unit coupled with the packed data registers and the decode unit, the execution unit, in response to the instruction, to store a result operand in a destination storage location that is to be indicated by the instruction, the result operand to include, a different corresponding single bit for each of the number of sub-lane sized bit selection elements, a value of each single bit of the result operand corresponding to a sub-lane sized bit selection element to be that of a single bit of a corresponding lane of bits, of the at least one lane of bits of the first source operand, which is indicated by the corresponding sub-lane sized bit selection element. 16. A method in a processor comprising: receiving an instruction, the instruction indicating a first source operand having at least one lane of bits, and the instruction having a field specifying a packed data register storing a second source packed data operand having a number of sub-lane sized bit selection elements; and storing a result operand in a destination storage location indicated by the instruction in response to the instruction, the result operand including a different corresponding single bit for each of the number of sub-lane sized bit selection elements, a value of each single bit of the result operand that corresponds to a sub-lane sized bit selection element being that of a single bit of a corresponding lane of bits, of the at least one lane of bits of the first source operand, indicated by the corresponding sub-lane sized bit selection element, wherein the result operand includes a plurality of the single bits for each of the at least one lane of bits, and wherein the plurality of the single bits for each of the at least one lane of bits are stored in adjacent bit positions. 17. The method of claim 16 , wherein storing comprises storing the result operand in the destination storage location which is a predicate mask register, and wherein each single bit of the result operand corresponds to a sub-lane sized bit selection element in a same relative position.
of compound instructions · CPC title
LOAD or STORE instructions; Clear instruction · CPC title
using a mask · CPC title
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
Bit or string instructions · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.