Gather-op instruction to duplicate a mask and perform an operation on vector elements gathered via tracked offset-based gathering
US-9747101-B2 · Aug 29, 2017 · US
US10146737B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10146737-B2 |
| Application number | US-201514616323-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 6, 2015 |
| Priority date | Jun 2, 2012 |
| Publication date | Dec 4, 2018 |
| Grant date | Dec 4, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. An address is accessed to load a corresponding data element if the mask element had the first value. The data element is written at an in-register position in a destination vector register according to a respective in-register position the index. Values of corresponding mask elements are changed from the first value to a second value responsive to completion of their respective loads.
Opening claim text (preview).
What is claimed is: 1. A computer implemented method comprising: decoding a single instruction; executing the decoded single instruction by: copying, from one or more registers, a set of indices and a corresponding set of mask elements to an index array, generating a set of addresses from the set of indices in the index array for at least each corresponding mask element having a first value; accessing an address from the set of addresses to store a corresponding data element if a corresponding mask element has said first value, and changing the values of corresponding mask elements from the first value to a second value responsive to completion of their respective stores. 2. The computer implemented method of claim 1 , wherein the single instruction is a single instruction multiple data (SIMD) instruction. 3. The computer implemented method of claim 2 , wherein said copying the set of indices and the corresponding set of mask elements to said index array being performed responsive to a first micro-operation generated by decoding said SIMD instruction. 4. The computer implemented method of claim 1 , wherein the executing further comprises: allocating buffer storage for addresses corresponding to the set of indices, and copying vector data elements to the allocated buffer storage. 5. The computer implemented method of claim 1 , wherein the mask elements are stored in a register. 6. The computer implemented method of claim 5 , wherein the register is architecturally visible. 7. A non-transitory machine readable medium storing a single instruction, when processing by a processor causing the processor to perform a method comprising: decoding the single instruction; executing the decoded single instruction by: copying, from one or more registers, a set of indices and a corresponding set of mask elements to an index array, generating a set of addresses from the set of indices in the index array for at least each corresponding mask element having a first value; accessing an address from the set of addresses to store a corresponding data element if a corresponding mask element has said first value, and changing the values of corresponding mask elements from the first value to a second value responsive to completion of their respective stores. 8. The non-transitory machine readable medium of claim 7 , wherein the single instruction is single instruction multiple data (SIMD) instruction. 9. The non-transitory machine readable medium of claim 8 , wherein said copying the set of indices and the corresponding set of mask elements to said index array being performed responsive to a first micro-operation generated by decoding said SIMD instruction. 10. The non-transitory machine readable medium of claim 7 , wherein the executing further comprises: allocating buffer storage for addresses corresponding to the set of indices, and copying vector data elements to the allocated buffer storage. 11. The non-transitory machine readable medium of claim 7 , wherein the mask elements are stored in a register. 12. The non-transitory machine readable medium of claim 11 , wherein the register is architecturally visible. 13. The non-transitory machine readable medium of claim 7 , wherein the to execute further comprises to: allocate buffer storage for addresses corresponding to the set of indices, and copy vector data elements to the allocated buffer storage. 14. An apparatus comprising: decode circuitry to decode a single instruction; execution circuitry to execute the decoded single instruction to: copy, from one or more registers, a set of indices and a corresponding set of mask elements to an index array, generate a set of addresses from the set of indices in the index array for at least each corresponding mask element having a first value; access an address from the set of addresses to store a corresponding data element if a corresponding mask element has said first value, and change the values of corresponding mask elements from the first value to a second value responsive to completion of their respective stores. 15. The apparatus of claim 14 , wherein the single instruction is single instruction multiple data (SIMD) instruction. 16. The apparatus of claim 15 , wherein said to copy the set of indices and the corresponding set of mask elements to said index array is performed responsive to a first micro-operation generated by decoding said SIMD instruction. 17. The apparatus of claim 14 , wherein the mask elements are to be stored in a register. 18. The apparatus of claim 17 , wherein the register is architecturally visible.
single instruction multiple data [SIMD] multiprocessors · CPC title
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
LOAD or STORE instructions; Clear instruction · CPC title
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
Instruction analysis, e.g. decoding, instruction word fields · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.