Instruction and logic to provide vector compress and rotate functionality

US9606961B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9606961-B2
Application numberUS-201213664401-A
CountryUS
Kind codeB2
Filing dateOct 30, 2012
Priority dateOct 30, 2012
Publication dateMar 28, 2017
Grant dateMar 28, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Instructions and logic provide vector compress and rotate functionality. Some embodiments, responsive to an instruction specifying: a vector source, a mask, a vector destination and destination offset, read the mask, and copy corresponding unmasked vector elements from the vector source to adjacent sequential locations in the vector destination, starting at the vector destination offset location. In some embodiments, the unmasked vector elements from the vector source are copied to adjacent sequential element locations modulo the total number of element locations in the vector destination. In some alternative embodiments, copying stops whenever the vector destination is full, and upon copying an unmasked vector element from the vector source to an adjacent sequential element location in the vector destination, the value of a corresponding field in the mask is changed to a masked value. Alternative embodiments zero elements of the vector destination, in which no element from the vector source is copied.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: a mask register comprising a first plurality of data fields, wherein each of the first plurality of data fields in the mask register corresponds to an element location in a vector; a decode stage to decode a first instruction specifying a vector source operand, the mask register, a vector destination operand and a vector destination offset location; and one or more execution units, responsive to the decoded first instruction, to: read a plurality of values in the first plurality of data fields of the mask register, the plurality of values being unmasked values; for a first value in the first plurality of data fields in the mask register, copy a corresponding first vector element from the vector source operand to a first adjacent sequential element location in the vector destination, the first vector element being at the vector destination offset location; wherein upon copying the corresponding first vector element from the vector source operand to an adjacent sequential element location in the vector destination, change the first value in the mask register from a first unmasked value to a-first masked value; for a second value in the first plurality of data fields in the mask register, copy a corresponding second vector element from the vector source operand to a second adjacent sequential element location in the vector destination; and upon copying the corresponding second vector element from the vector source operand to an adjacent sequential element location in the vector destination: change the second value in the mask register from a second unmasked value to a second masked value, the first masked value and the second masked value being used to track progress of a completion of the decoded first instruction; determine that the vector destination is full and store the vector destination to a memory; set the vector destination offset location to zero; and re-execute the first instruction using the first masked value, the second masked value, and the vector destination offset location to compress a third vector element. 2. The processor of claim 1 , wherein the corresponding first and second vector elements from the vector source operand are copied to adjacent sequential element locations modulo a total number of element locations in the vector destination. 3. The processor of claim 2 , wherein the first instruction is a vector compress and rotate instruction. 4. The processor of claim 1 , wherein the corresponding first and second vector elements are copied from the vector source operand to adjacent sequential element locations starting at the vector destination offset location until a most significant vector destination element location is filled. 5. The processor of claim 4 , wherein the first instruction is a vector compress, fill and rotate instruction. 6. The processor of claim 1 , wherein the first unmasked value is one. 7. The processor of claim 5 , wherein the second unmasked value is zero. 8. The processor of claim 1 , wherein the first vector element and the second vector element copied into the vector destination operand are 32-bit data elements. 9. The processor of claim 1 , wherein the first vector element and the second vector element copied into the vector destination operand are 64-bit data elements. 10. The processor of claim 1 , wherein the vector destination operand is a 128-bit vector register. 11. The processor of claim 1 , wherein the vector destination operand is a 256-bit vector register. 12. The processor of claim 1 , wherein the vector destination operand is a 512-bit vector register. 13. A non-transitory machine-readable medium to record functional descriptive material including a first executable instruction, which when executed by a machine causes the machine to: read a plurality of values in a first plurality of data fields in a mask register, the plurality of values being unmasked values; for a value in the first plurality of data fields in the mask register, copy a corresponding first vector element from a vector source operand to an adjacent sequential element location in a vector destination, the first vector element being at a vector destination offset location; for the corresponding first vector element copied from the vector source operand to the adjacent sequential element location in the vector destination, change the value of a corresponding data field in the mask register from an unmasked value to a masked value, the masked value being used to track progress of a completion of the first executable instruction; determine that the vector destination is full and store the vector destination to a memory; set the vector destination offset location to zero; and re-execute the first instruction using the masked value and the vector destination offset location to compress a second vector element. 14. The non-transitory machine-readable medium of claim 13 , wherein the corresponding first vector element from the vector source operand is copied to an adjacent sequential element location modulo a total number of element locations in the vector destination. 15. The non-transitory machine-readable medium of claim 13 , wherein the copying the corresponding first vector element is copied from the vector source operand to the adjacent sequential element location, the adjacent sequential element location starting at the vector destination offset location. 16. The non-transitory machine-readable medium of claim 13 , wherein the first vector element stored into the vector destination is a 32-bit data element. 17. The non-transitory machine-readable medium of claim 13 , wherein the first vector element stored into the vector destination is a 64-bit data element. 18. The non-transitory machine-readable medium of claim 13 , wherein the vector destination is a 128-bit vector register. 19. The non-transitory machine-readable medium of claim 13 , wherein the vector destination is a 256-bit vector register. 20. The non-transitory machine-readable medium of claim 13 , wherein the vector destination is a 512-bit vector register. 21. A processor comprising: a decode stage to decode a first single-instruction-multiple-data (SIMD) instruction specifying: a vector source operand, a mask register, a vector destination operand and a vector destination offset location; and one or more execution units, responsive to the decoded first SIMD instruction, to: read a plurality of values of a first plurality of data fields of the mask register; for a value in the first plurality of data fields in the mask register, copy a corresponding first vector element from the vector source to an adjacent sequential element location in the vector destination modulo a total number of element locations in the vector destination, starting at the vector destination offset location; change the value in the mask register from an unmasked value to a masked value, the masked value being used to track progress of a completion of the decoded first SIMD instructions; determine that the vector destination is full and store the vector destination to a memory; set the vector destination offset location to zero; and re-execute the first instruction using the masked value and the vector destination offset location to compress a second vector element. 22. The processor of claim 21 , wherein the vector destination is a 128-bit vector register. 23. The processor of claim 21 , wherein the vector destination is

Assignees

Inventors

Classifications

  • using a secondary processor, e.g. coprocessor (peripheral processor G06F13/12) · CPC title

  • Decoding the operand specifier, e.g. specifier format · CPC title

  • using a plurality of independent parallel functional units · CPC title

  • of variable length instructions · CPC title

  • according to data content, e.g. floating-point registers, address registers · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9606961B2 cover?
Instructions and logic provide vector compress and rotate functionality. Some embodiments, responsive to an instruction specifying: a vector source, a mask, a vector destination and destination offset, read the mask, and copy corresponding unmasked vector elements from the vector source to adjacent sequential locations in the vector destination, starting at the vector destination offset locatio…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F15/8076. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 28 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).