Gather using index array and finite state machine

US10146737B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10146737-B2
Application numberUS-201514616323-A
CountryUS
Kind codeB2
Filing dateFeb 6, 2015
Priority dateJun 2, 2012
Publication dateDec 4, 2018
Grant dateDec 4, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. An address is accessed to load a corresponding data element if the mask element had the first value. The data element is written at an in-register position in a destination vector register according to a respective in-register position the index. Values of corresponding mask elements are changed from the first value to a second value responsive to completion of their respective loads.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method comprising: decoding a single instruction; executing the decoded single instruction by: copying, from one or more registers, a set of indices and a corresponding set of mask elements to an index array, generating a set of addresses from the set of indices in the index array for at least each corresponding mask element having a first value; accessing an address from the set of addresses to store a corresponding data element if a corresponding mask element has said first value, and changing the values of corresponding mask elements from the first value to a second value responsive to completion of their respective stores. 2. The computer implemented method of claim 1 , wherein the single instruction is a single instruction multiple data (SIMD) instruction. 3. The computer implemented method of claim 2 , wherein said copying the set of indices and the corresponding set of mask elements to said index array being performed responsive to a first micro-operation generated by decoding said SIMD instruction. 4. The computer implemented method of claim 1 , wherein the executing further comprises: allocating buffer storage for addresses corresponding to the set of indices, and copying vector data elements to the allocated buffer storage. 5. The computer implemented method of claim 1 , wherein the mask elements are stored in a register. 6. The computer implemented method of claim 5 , wherein the register is architecturally visible. 7. A non-transitory machine readable medium storing a single instruction, when processing by a processor causing the processor to perform a method comprising: decoding the single instruction; executing the decoded single instruction by: copying, from one or more registers, a set of indices and a corresponding set of mask elements to an index array, generating a set of addresses from the set of indices in the index array for at least each corresponding mask element having a first value; accessing an address from the set of addresses to store a corresponding data element if a corresponding mask element has said first value, and changing the values of corresponding mask elements from the first value to a second value responsive to completion of their respective stores. 8. The non-transitory machine readable medium of claim 7 , wherein the single instruction is single instruction multiple data (SIMD) instruction. 9. The non-transitory machine readable medium of claim 8 , wherein said copying the set of indices and the corresponding set of mask elements to said index array being performed responsive to a first micro-operation generated by decoding said SIMD instruction. 10. The non-transitory machine readable medium of claim 7 , wherein the executing further comprises: allocating buffer storage for addresses corresponding to the set of indices, and copying vector data elements to the allocated buffer storage. 11. The non-transitory machine readable medium of claim 7 , wherein the mask elements are stored in a register. 12. The non-transitory machine readable medium of claim 11 , wherein the register is architecturally visible. 13. The non-transitory machine readable medium of claim 7 , wherein the to execute further comprises to: allocate buffer storage for addresses corresponding to the set of indices, and copy vector data elements to the allocated buffer storage. 14. An apparatus comprising: decode circuitry to decode a single instruction; execution circuitry to execute the decoded single instruction to: copy, from one or more registers, a set of indices and a corresponding set of mask elements to an index array, generate a set of addresses from the set of indices in the index array for at least each corresponding mask element having a first value; access an address from the set of addresses to store a corresponding data element if a corresponding mask element has said first value, and change the values of corresponding mask elements from the first value to a second value responsive to completion of their respective stores. 15. The apparatus of claim 14 , wherein the single instruction is single instruction multiple data (SIMD) instruction. 16. The apparatus of claim 15 , wherein said to copy the set of indices and the corresponding set of mask elements to said index array is performed responsive to a first micro-operation generated by decoding said SIMD instruction. 17. The apparatus of claim 14 , wherein the mask elements are to be stored in a register. 18. The apparatus of claim 17 , wherein the register is architecturally visible.

Assignees

Inventors

Classifications

  • single instruction multiple data [SIMD] multiprocessors · CPC title

  • controlled by a single instruction for multiple data lanes [SIMD] · CPC title

  • LOAD or STORE instructions; Clear instruction · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • Instruction analysis, e.g. decoding, instruction word fields · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10146737B2 cover?
Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F15/8007. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 04 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).