Packed data operation mask comparison processors, methods, systems, and instructions
US-2016154652-A1 · Jun 2, 2016 · US
US9606961B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9606961-B2 |
| Application number | US-201213664401-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 30, 2012 |
| Priority date | Oct 30, 2012 |
| Publication date | Mar 28, 2017 |
| Grant date | Mar 28, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Instructions and logic provide vector compress and rotate functionality. Some embodiments, responsive to an instruction specifying: a vector source, a mask, a vector destination and destination offset, read the mask, and copy corresponding unmasked vector elements from the vector source to adjacent sequential locations in the vector destination, starting at the vector destination offset location. In some embodiments, the unmasked vector elements from the vector source are copied to adjacent sequential element locations modulo the total number of element locations in the vector destination. In some alternative embodiments, copying stops whenever the vector destination is full, and upon copying an unmasked vector element from the vector source to an adjacent sequential element location in the vector destination, the value of a corresponding field in the mask is changed to a masked value. Alternative embodiments zero elements of the vector destination, in which no element from the vector source is copied.
Opening claim text (preview).
What is claimed is: 1. A processor comprising: a mask register comprising a first plurality of data fields, wherein each of the first plurality of data fields in the mask register corresponds to an element location in a vector; a decode stage to decode a first instruction specifying a vector source operand, the mask register, a vector destination operand and a vector destination offset location; and one or more execution units, responsive to the decoded first instruction, to: read a plurality of values in the first plurality of data fields of the mask register, the plurality of values being unmasked values; for a first value in the first plurality of data fields in the mask register, copy a corresponding first vector element from the vector source operand to a first adjacent sequential element location in the vector destination, the first vector element being at the vector destination offset location; wherein upon copying the corresponding first vector element from the vector source operand to an adjacent sequential element location in the vector destination, change the first value in the mask register from a first unmasked value to a-first masked value; for a second value in the first plurality of data fields in the mask register, copy a corresponding second vector element from the vector source operand to a second adjacent sequential element location in the vector destination; and upon copying the corresponding second vector element from the vector source operand to an adjacent sequential element location in the vector destination: change the second value in the mask register from a second unmasked value to a second masked value, the first masked value and the second masked value being used to track progress of a completion of the decoded first instruction; determine that the vector destination is full and store the vector destination to a memory; set the vector destination offset location to zero; and re-execute the first instruction using the first masked value, the second masked value, and the vector destination offset location to compress a third vector element. 2. The processor of claim 1 , wherein the corresponding first and second vector elements from the vector source operand are copied to adjacent sequential element locations modulo a total number of element locations in the vector destination. 3. The processor of claim 2 , wherein the first instruction is a vector compress and rotate instruction. 4. The processor of claim 1 , wherein the corresponding first and second vector elements are copied from the vector source operand to adjacent sequential element locations starting at the vector destination offset location until a most significant vector destination element location is filled. 5. The processor of claim 4 , wherein the first instruction is a vector compress, fill and rotate instruction. 6. The processor of claim 1 , wherein the first unmasked value is one. 7. The processor of claim 5 , wherein the second unmasked value is zero. 8. The processor of claim 1 , wherein the first vector element and the second vector element copied into the vector destination operand are 32-bit data elements. 9. The processor of claim 1 , wherein the first vector element and the second vector element copied into the vector destination operand are 64-bit data elements. 10. The processor of claim 1 , wherein the vector destination operand is a 128-bit vector register. 11. The processor of claim 1 , wherein the vector destination operand is a 256-bit vector register. 12. The processor of claim 1 , wherein the vector destination operand is a 512-bit vector register. 13. A non-transitory machine-readable medium to record functional descriptive material including a first executable instruction, which when executed by a machine causes the machine to: read a plurality of values in a first plurality of data fields in a mask register, the plurality of values being unmasked values; for a value in the first plurality of data fields in the mask register, copy a corresponding first vector element from a vector source operand to an adjacent sequential element location in a vector destination, the first vector element being at a vector destination offset location; for the corresponding first vector element copied from the vector source operand to the adjacent sequential element location in the vector destination, change the value of a corresponding data field in the mask register from an unmasked value to a masked value, the masked value being used to track progress of a completion of the first executable instruction; determine that the vector destination is full and store the vector destination to a memory; set the vector destination offset location to zero; and re-execute the first instruction using the masked value and the vector destination offset location to compress a second vector element. 14. The non-transitory machine-readable medium of claim 13 , wherein the corresponding first vector element from the vector source operand is copied to an adjacent sequential element location modulo a total number of element locations in the vector destination. 15. The non-transitory machine-readable medium of claim 13 , wherein the copying the corresponding first vector element is copied from the vector source operand to the adjacent sequential element location, the adjacent sequential element location starting at the vector destination offset location. 16. The non-transitory machine-readable medium of claim 13 , wherein the first vector element stored into the vector destination is a 32-bit data element. 17. The non-transitory machine-readable medium of claim 13 , wherein the first vector element stored into the vector destination is a 64-bit data element. 18. The non-transitory machine-readable medium of claim 13 , wherein the vector destination is a 128-bit vector register. 19. The non-transitory machine-readable medium of claim 13 , wherein the vector destination is a 256-bit vector register. 20. The non-transitory machine-readable medium of claim 13 , wherein the vector destination is a 512-bit vector register. 21. A processor comprising: a decode stage to decode a first single-instruction-multiple-data (SIMD) instruction specifying: a vector source operand, a mask register, a vector destination operand and a vector destination offset location; and one or more execution units, responsive to the decoded first SIMD instruction, to: read a plurality of values of a first plurality of data fields of the mask register; for a value in the first plurality of data fields in the mask register, copy a corresponding first vector element from the vector source to an adjacent sequential element location in the vector destination modulo a total number of element locations in the vector destination, starting at the vector destination offset location; change the value in the mask register from an unmasked value to a masked value, the masked value being used to track progress of a completion of the decoded first SIMD instructions; determine that the vector destination is full and store the vector destination to a memory; set the vector destination offset location to zero; and re-execute the first instruction using the masked value and the vector destination offset location to compress a second vector element. 22. The processor of claim 21 , wherein the vector destination is a 128-bit vector register. 23. The processor of claim 21 , wherein the vector destination is
using a secondary processor, e.g. coprocessor (peripheral processor G06F13/12) · CPC title
Decoding the operand specifier, e.g. specifier format · CPC title
using a plurality of independent parallel functional units · CPC title
of variable length instructions · CPC title
according to data content, e.g. floating-point registers, address registers · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.