Coalescing adjacent gather/scatter operations

US9348601B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9348601-B2
Application numberUS-201213997784-A
CountryUS
Kind codeB2
Filing dateDec 26, 2012
Priority dateDec 26, 2012
Publication dateMay 24, 2016
Grant dateMay 24, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor, comprising: an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction to have a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements; an execution unit coupled to the instruction decoder, in response to a decoded first instruction, to read a first and a second of the data elements that are contiguous from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry position of the first storage location and a second data element in a first entry position of a second storage location, wherein the entry positions are the same in the first and second storage locations and the first and second storage locations are different. 2. The processor of claim 1 , wherein the first instruction further comprises a third operand specifying the second storage location. 3. The processor of claim 1 , wherein the instruction decoder further to decode a second instruction having a third operand specifying the second storage location, and a fourth operand specifying a second memory address, the second memory address being offset from the first memory address by the size of a single data element. 4. The processor of claim 3 , wherein the first instruction further comprises a prefix to indicate to the instruction decoder and execution unit that the second instruction follows the first instruction. 5. The processor of claim 3 , wherein the execution unit to predict that the second instruction to follow the first instruction. 6. The processor of claim 1 , wherein the first entry of the first storage location is not contiguous to the second entry of the second storage location, and wherein the second storage location is specified by the first operand. 7. The processor of claim 1 , wherein the first data element is stored in a third entry of a third storage location prior to being stored in the first entry of the first storage location, and the second data element is stored in a fourth entry of a fourth storage location prior to being stored in the second entry of the second storage location. 8. A method, comprising: decoding a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements; reading, in response to the decoded first instruction, a first and a second of the data elements that are contiguous from a memory location based on the first memory address indicated by the second operand; and storing the first data element in a first entry position of the first storage location and a second data element in a first entry position of a second storage location, wherein the entry positions are the same in the first and second storage locations and the first and second storage locations are different. 9. The method of claim 8 , wherein the first instruction further comprises a third operand specifying the second storage location. 10. The method of claim 8 , wherein the instruction decoder further decodes a second instruction having a third operand specifying the second storage location, and a fourth operand specifying a second memory address, the second memory address being offset from the first memory address by the size of a single data element. 11. The method of claim 10 , wherein the first instruction further comprises a prefix indicating to the instruction decoder and execution unit that the second instruction follows the first instruction. 12. The method of claim 10 , wherein the execution unit predicts that the second instruction follows the first instruction. 13. The method of claim 8 , wherein the first entry of the first storage location is not contiguous to the second entry of the second storage location, and wherein the second storage location is specified by the first operand. 14. The method of claim 8 , wherein the first data element is stored in a third entry of a third storage location prior to being stored in the first entry of the first storage location, and the second data element is stored in a fourth entry of a fourth storage location prior to being stored in the second entry of the second storage location. 15. A data processing system, comprising: an interconnect; a dynamic random access memory (DRAM) coupled to the interconnect; and a processor coupled the interconnect, including an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction to have a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements; an execution unit coupled to the instruction decoder, in response to a decoded first instruction, to read a first and a second of the data elements that are contiguous from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry position of the first storage location and a second data element in a first entry position of a second storage location, wherein the entry positions are the same in the first and second storage locations and the first and second storage locations are different. 16. The data processing system of claim 15 , wherein the first instruction further comprises a third operand specifying the second storage location. 17. The data processing system of claim 15 , wherein the instruction decoder further to decode a second instruction having a third operand specifying the second storage location, and a fourth operand specifying a second memory address, the second memory address being offset from the first memory address by the size of a single data element. 18. The data processing system of claim 17 , wherein the first instruction further comprises a prefix to indicate to the instruction decoder and execution unit that the second instruction follows the first instruction. 19. The data processing system of claim 17 , wherein the execution unit to predict that the second instruction follows the first instruction. 20. The data processing system of claim 15 , wherein the first entry of the first storage location is not contiguous to the second entry of the second storage location, and wherein the second storage location is specified by the first operand. 21. The data processing system of claim 15 , wherein the first data element is stored in a third entry of a third storage location prior to being stored in the first entry of the first storage location, and the second data element is stored in a fourth entry of a fourth storage location prior to being stored in the second entry of the second storage location.

Assignees

Inventors

Classifications

  • Instruction operation extension or modification · CPC title

  • LOAD or STORE instructions; Clear instruction · CPC title

  • Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title

  • with dedicated cache, e.g. instruction or stack · CPC title

  • using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9348601B2 cover?
According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/3853. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 24 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).