Who is the assignee on this patent?

Sperber Zeev, Valentine Robert, Raikin Shlomo, and 6 more

What technology area does this patent fall under?

Primary CPC classification G06F15/7839. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 18 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Scatter using index array and finite state machine

US9626333B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9626333-B2
Application number	US-201213977727-A
Country	US
Kind code	B2
Filing date	Jun 2, 2012
Priority date	Jun 2, 2012
Publication date	Apr 18, 2017
Grant date	Apr 18, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatus are disclosed using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode scatter/gather instructions and generate micro-operations. An index array holds a set of indices and a corresponding set of mask elements. A finite state machine facilitates the scatter operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. Storage is allocated in a buffer for each of the set of addresses being generated. Data elements corresponding to the set of addresses being generated are copied to the buffer. Addresses from the set are accessed to store data elements if a corresponding mask element has said first value and the mask element is changed to a second value responsive to completion of their respective stores.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method comprising: copying, from one or more registers, a set of indices and a corresponding set of mask elements to an index array; generating a set of addresses from the set of indices in the index array for at least each corresponding mask element having a first value; allocating storage in a buffer for each of the set of addresses being generated; copying a set of data elements corresponding to the set of addresses being generated to the buffer; and accessing an address from the set of addresses to store a corresponding data element if a corresponding mask element has said first value; and changing the values of corresponding mask elements from the first value to a second value responsive to completion of their respective stores. 2. The computer implemented method of claim 1 being performed responsive to a single instruction multiple data (SIMD) scatter instruction. 3. The computer implemented method of claim 2 said copying, the set of indices and the corresponding set of mask elements to said index array being performed responsive to a first micro-operation generated by decoding said SIMD scatter instruction. 4. The computer implemented method of claim 3 said copying the set of data elements to the buffer being performed responsive to a second micro-operation of a set of micro-operations generated by decoding said SIMD scatter instruction. 5. The computer implemented method of claim 4 further comprising: initializing a finite state machine to expand said set of micro-operations to scatter-store data concurrently with execution of other instructions, responsive to said SIMD scatter instruction. 6. The computer implemented method of claim 5 said initializing a finite state machine to expand said set of micro-operations to scatter-store data concurrently with execution of other instructions being performed responsive to a third micro-operation of a set of micro-operations generated by decoding said SIMD scatter instruction. 7. The computer implemented method of claim 5 said generating a set of addresses from the set of indices in the index array being performed responsive to a third micro-operation of a set of micro-operations generated by decoding said SIMD scatter instruction. 8. The computer implemented method of claim 7 said changing the values of corresponding mask elements from the first value to the second value responsive to completion of their respective loads being performed responsive to a fourth micro-operation of the set of micro-operations generated by decoding said SIMD gather instruction. 9. An apparatus comprising: an index array to store a set of indices from a first single instruction multiple data (SIMD) register and a corresponding set of mask elements; a finite state machine (FSM) operatively coupled with the index array to facilitate a scatter operation using the set of indices and the corresponding mask elements; an address generation logic, responsive to the FSM, to generate an address from an index of the set of indices in the index array for at least each corresponding mask element having a first value; a buffer to store a set of data elements corresponding to the addresses being generated; a memory access unit, operatively coupled with the address generation logic, to access a first memory location corresponding to a first address generated to store, from the buffer, a first data element corresponding to the first address generated; and the FSM to change a value of a corresponding mask element from the first value to a second value. 10. The apparatus of claim 9 being responsive to a SIMD scatter instruction. 11. The apparatus of claim 10 further comprising: decode logic to decode said SIMD scatter instruction and to generate a set of micro-operations responsive to decoding said SIMD scatter instruction. 12. The apparatus of claim 11 said FSM to expand the set of micro-operations to scatter-store data concurrently with execution of other instructions, responsive to said SIMD scatter instruction. 13. The apparatus of claim 12 , wherein initializing of said index array to store the set of indices and the corresponding set of mask elements is performed responsive to a first micro-operation of the set of micro-operations generated by decoding said SIMD scatter instruction. 14. The apparatus of claim 13 , wherein copying to the buffer, the set of data elements corresponding to the addresses being generated, is performed responsive to a second micro-operation of the set of micro-operations generated by decoding said SIMD scatter instruction. 15. The apparatus of claim 13 , wherein initializing of said FSM to expand said set of micro-operations to scatter-store data concurrently with execution of other instructions, is performed responsive to a third micro-operation of the set of micro-operations generated by decoding said SIMD scatter instruction. 16. A processor comprising: a first register comprising a plurality of mask elements, wherein the plurality of mask elements in the first register corresponds to a plurality of data elements in a second register to be conditionally stored using a plurality of corresponding indices in a third register, wherein for each mask element in the first register, a first value indicates the corresponding data element has not been stored and a second value indicates that the corresponding data element does not need to be, or has already been stored using a corresponding index from the third register; a decoder stage to decode a first instruction to generate a set of micro-operations; and one or more execution units, responsive to the set of micro-operations, including: an index array to store the plurality of indices from the third register and the corresponding plurality of mask elements; a finite state machine (FSM) operatively coupled with the index array to facilitate a scatter operation using the plurality of indices and the corresponding mask elements. 17. The processor of claim 16 , further comprising: an address generation logic, responsive to the FSM, to generate an address from an index of the plurality of indices in the index array for at least each corresponding mask element having a first value. 18. The processor of claim 17 , further comprising: a buffer to store a set of data elements of the plurality of data elements in said second register corresponding to addresses being generated by the address generation logic. 19. The processor of claim 18 , further comprising: a memory access unit, operatively coupled with the address generation logic, to access a first memory location corresponding to a first address generated to store a first data element, of the set of data elements, corresponding to the first address. 20. The processor of claim 19 , wherein the FSM is to change a value of a corresponding mask element from the first value to a second value. 21. A system comprising: a memory to store a first instruction specifying a single instruction multiple data (SIMD) index register, a second SIMD register and a mask register, and a processor comprising an index array to store a set of indices from the SIMD index register and a corresponding set of mask elements from the mask register; a finite state machine (FSM) operatively coupled with the index array to facilitate a scatter operation using the set of indices and the corresponding mask elements; an address generation logic, responsive to the FSM, to generate an address from an index of the se

Assignees

Inventors

Classifications

G06F15/7839Primary
with memory · CPC title
G06F9/30145Primary
Instruction analysis, e.g. decoding, instruction word fields · CPC title
G06F9/383
Operand prefetching (cache prefetching G06F12/0862) · CPC title
G06F9/30043
LOAD or STORE instructions; Clear instruction · CPC title
G06F9/345
of multiple operands or results {(addressing multiple banks G06F12/06)} · CPC title

Patent family

Related publications grouped by family.

View patent family 49673783

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9626333B2 cover?: Methods and apparatus are disclosed using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode scatter/gather instructions and generate micro-operations. An index array holds a set of indices and a corresponding set of mask elements. A finite state machine facilitates the scatter operation. Address generation logic g…
Who is the assignee on this patent?: Sperber Zeev, Valentine Robert, Raikin Shlomo, and 6 more
What technology area does this patent fall under?: Primary CPC classification G06F15/7839. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 18 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).