Replicate elements instruction
US-2019303155-A1 · Oct 3, 2019 · US
US12079628B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12079628-B2 |
| Application number | US-202117493667-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 4, 2021 |
| Priority date | Dec 29, 2017 |
| Publication date | Sep 3, 2024 |
| Grant date | Sep 3, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus and method for loop flattening and reduction in a SIMD pipeline including broadcast, move, and reduction instructions. For example, one embodiment of a processor comprises: a decoder to decode a broadcast instruction to generate a decoded broadcast instruction identifying a plurality of operations, the broadcast instruction including an opcode, first and second source operands, and at least one destination operand, the broadcast instruction having a split value associated therewith; a first source register associated with the first source operand to store a first plurality of packed data elements; a second source register associated with the second source operand to store a second plurality of packed data elements; execution circuitry to execute the operations of the decoded broadcast instruction, the execution circuitry to copy a first number of contiguous data elements from the first source register to a first set of contiguous data element locations in a destination register specified by the destination operand, the execution circuitry to further copy a second number of contiguous data elements from the second source register to a second set of contiguous data element locations in the destination register, wherein the execution circuitry is to determine the first number and the second number in accordance with the split value associated with the broadcast instruction.
Opening claim text (preview).
What is claimed is: 1. A processor comprising: a decoder configured to decode a move instruction to generate a decoded move instruction identifying a plurality of operations, the move instruction including an opcode, and first and second source operands, the move instruction having a split value associated therewith; a first source register associated with the first source operand to store a first plurality of packed data elements; a second source register associated with the second source operand to store a second plurality of packed data elements; and execution circuitry configured to execute the operations of the decoded move instruction, the execution circuitry configured to select a first set of contiguous data elements from the first source register to generate a first result and configured to select a second set of contiguous data elements from the second source register to generate a second result and to store the first and second results in first and second locations of a destination vector register, wherein the execution circuitry is configured to determine the first set of contiguous data elements and the second set of contiguous data elements in accordance with the split value associated with the move instruction. 2. The processor of claim 1 wherein the split value is to be included in an immediate of the move instruction. 3. The processor of claim 1 wherein the split value is to be stored in a third source register. 4. The processor of claim 1 wherein the split value is restricted to a power of 2. 5. The processor of claim 1 wherein the move instruction is to indicate a mask value, the execution circuitry to use the mask value to determine whether to write a zero value to a data element location in the destination vector register instead of one of the first and second sets of contiguous data elements. 6. The processor of claim 1 wherein the move instruction is to indicate a mask value, the execution circuitry to use the mask value to determine whether to write a value from a third source vector register to a data element location in the destination register instead of one of the first and second sets of contiguous data elements. 7. The processor of claim 6 wherein the mask value is to be included in an immediate of the move instruction or stored in a fourth source register. 8. A method comprising: decoding a move instruction to generate a decoded move instruction identifying a plurality of operations, the move instruction including an opcode, and first and second source operands, the move instruction having a split value associated therewith; storing a first plurality of packed data elements in a first source register associated with the first source operand; storing a second plurality of packed data elements in a second source register associated with the second source operand; executing the plurality of operations of the decoded move instruction including: selecting a first set of contiguous data elements from the first source register to generate a first result, selecting a second set of contiguous data elements from the second source register to generate a second result, and storing the first and second results in a destination register, wherein the first set of contiguous data elements and the second set of contiguous data elements are determined in accordance with the split value associated with the move instruction. 9. The method of claim 8 wherein the split value is to be included in an immediate of the move instruction. 10. The method of claim 8 wherein the split value is to be stored in a third source register. 11. The method of claim 8 wherein the split value is to be restricted to a power of 2. 12. The method of claim 11 wherein the move instruction is to indicate a mask value, the execution circuitry to use the mask value to determine whether to write a zero value to a data element location in the destination register instead of one of the first and second sets of contiguous data elements. 13. The method of claim 8 wherein the move instruction is to indicate a mask value, the execution circuitry to use the mask value to determine whether to write a value from a third source vector register to a data element location in the destination register instead of one of the first and second sets of contiguous data elements. 14. The method of claim 13 wherein the mask value is to be included in an immediate of the move instruction or stored in a fourth source register. 15. A non-transitory machine-readable medium having program code stored thereon which, when executed by a machine, causes the machine to perform the operations of: decoding a move instruction to generate a decoded move instruction identifying a plurality of operations, the move instruction including an opcode, and first and second source operands, the move instruction having a split value associated therewith; storing a first plurality of packed data elements in a first source register associated with the first source operand; storing a second plurality of packed data elements in a second source register associated with the second source operand; executing the plurality of operations of the decoded move instruction including: selecting a first set of contiguous data elements from the first source register to generate a first result, selecting a second set of contiguous data elements from the second source register to generate a second result, and storing the first and second results in a destination register, wherein the first set of contiguous data elements and the second set of contiguous data elements are determined in accordance with the split value associated with the move instruction. 16. The non-transitory machine-readable medium of claim 15 wherein the split value is to be included in an immediate of the move instruction. 17. The non-transitory machine-readable medium of claim 15 wherein the split value is to be stored in a third source register. 18. The non-transitory machine-readable medium of claim 15 wherein the split value is to be restricted to a power of 2. 19. The non-transitory machine-readable medium of claim 18 wherein the move instruction is to indicate a mask value, the execution circuitry to use the mask value to determine whether to write a zero value to a data element location in the destination register instead of one of the first and second sets of contiguous data elements. 20. The non-transitory machine-readable medium of claim 15 wherein the move instruction is to indicate a mask value, the execution circuitry to use the mask value to determine whether to write a value from a third source vector register to a data element location in the destination register instead of one of the first and second sets of contiguous data elements. 21. The non-transitory machine-readable medium of claim 20 wherein the mask value is to be included in an immediate of the move instruction or stored in a fourth source register.
Iterative single instructions for multiple data lanes [SIMD] · CPC title
using a mask · CPC title
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
Bit or string instructions · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.