Gather using index array and finite state machine

US9753889B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9753889-B2
Application numberUS-201514881111-A
CountryUS
Kind codeB2
Filing dateOct 12, 2015
Priority dateJun 2, 2012
Publication dateSep 5, 2017
Grant dateSep 5, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. An address is accessed to load a corresponding data element if the mask element had the first value. The data element is written at an in-register position in a destination vector register according to a respective in-register position the index. Values of corresponding mask elements are changed from the first value to a second value responsive to completion of their respective loads.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method comprising: responsive to a single decoded instruction: copying from one or more registers a set of indices and a corresponding set of mask elements to an index array responsive to a first micro-operation generated by decoding a SIMD gather instruction; generating a set of addresses from the set of indices in the index array for at least each corresponding mask element having a first value; accessing an address from the set of addresses to load a corresponding data element if a corresponding mask element has said first value; and initializing a finite state machine to expand a set of micro-operations to load and gather data concurrently with execution of other instructions, responsive to said SIMD gather instruction. 2. The computer implemented method of claim 1 , wherein said initializing a finite state machine to expand said set of micro-operations to load and gather data concurrently with execution of other instructions, being performed responsive to the second micro-operation generated by decoding said SIMD gather instruction. 3. The computer implemented method of claim 2 further comprising: writing the corresponding data element at an in-register position in a destination vector register according to a respective in-register position of an index, from the set of indices, corresponding to the accessed address from the set of addresses. 4. The computer implemented method of claim 3 further comprising: merging data elements at respective in-register positions in a temporary vector register according to respective in-register positions of indices, of the set of indices, corresponding to the respectively accessed addresses from the set of addresses. 5. The computer implemented method of claim 4 said merging being performed responsive to a third micro-operation of the set of micro-operations generated by decoding said SIMD gather instruction. 6. The computer implemented method of claim 2 further comprising: changing the values of corresponding mask elements from the first value to a second value responsive to completion of their respective loads. 7. The computer implemented method of claim 6 said changing the values of corresponding mask elements from the first value to the second value responsive to completion of their respective loads being performed responsive to a third micro-operation of the set of micro-operations generated by decoding said SIMD gather instruction. 8. An apparatus comprising: an index array; a finite state machine (FSM) operatively coupled with the index array to store a set of indices from a first single instruction multiple data (SIMD) register and a corresponding set of mask elements to facilitate a SIMD gather operation; an address generation logic to generate an address from an index of the set of indices in the index array for at least each corresponding mask element having a first value to access a first memory location corresponding to a first address generated to load a first data element; a merge data logic, operatively coupled with a second SIMD register, to write the first data element at a first in-register position in the second SIMD register according to a respective in-register position in the first SIMD register of an index corresponding to said first address generated. 9. The apparatus of claim 8 , wherein said merge data logic, is further to merge the plurality of data elements into the second SIMD register. 10. The apparatus of claim 8 , further comprising: decode logic to decode said SIMD gather instruction and to generate a set of micro-operations responsive to decoding said SIMD gather instruction. 11. The apparatus of claim 10 said FSM to expand the set of micro-operations to load and gather data concurrently with execution of other instructions, responsive to said SIMD gather instruction. 12. The apparatus of claim 11 said FSM to change a value of a corresponding mask element from the first value to a second value upon completion of loading the first data element. 13. A processor comprising: a first register comprising a plurality of mask elements, wherein the plurality of mask elements in the first register corresponds to a plurality of data elements accessible using a plurality of corresponding indices in a second register; a decoder to decode a first instruction to generate a set of micro-operations; and one or more execution units including: an index array to store a copy of the plurality of indices from the second register and the corresponding plurality of mask elements, and a finite state machine (FSM) operatively coupled with the index array to facilitate a gather operation using the plurality of indices and the corresponding mask elements, a merge data logic, operatively coupled with the memory access unit and with a third register, to write the first data element at a first in-register position in the third register according to a respective in-register position in the second register of an index corresponding to said first address generated. 14. The processor of claim 13 , wherein for each mask element in the first register, a first value indicates the corresponding data element has not been accessed and a second value indicates that the corresponding data element does not need to be, or has already been accessed using a corresponding index from the second register. 15. The processor of claim 13 , further comprising: an address generation logic, responsive to the FSM, to generate an address from an index of the plurality of indices in the index array for at least each corresponding mask element having a first value. 16. The processor of claim 15 , further comprising: a memory access unit, operatively coupled with the address generation logic, to access a first memory location corresponding to a first address generated to load a first data element. 17. The processor of claim 16 , wherein the FSM is to change a value of a corresponding mask element from the first value to a second value upon completion of loading the first data element.

Assignees

Inventors

Classifications

  • of multiple operands or results {(addressing multiple banks G06F12/06)} · CPC title

  • LOAD or STORE instructions; Clear instruction · CPC title

  • Instruction analysis, e.g. decoding, instruction word fields · CPC title

  • single instruction multiple data [SIMD] multiprocessors · CPC title

  • Bit or string instructions · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9753889B2 cover?
Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F15/8007. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 05 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).