Streaming engine with separately selectable element and group duplication
US-11860790-B2 · Jan 2, 2024 · US
US9348601B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9348601-B2 |
| Application number | US-201213997784-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 26, 2012 |
| Priority date | Dec 26, 2012 |
| Publication date | May 24, 2016 |
| Grant date | May 24, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
Opening claim text (preview).
What is claimed is: 1. A processor, comprising: an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction to have a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements; an execution unit coupled to the instruction decoder, in response to a decoded first instruction, to read a first and a second of the data elements that are contiguous from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry position of the first storage location and a second data element in a first entry position of a second storage location, wherein the entry positions are the same in the first and second storage locations and the first and second storage locations are different. 2. The processor of claim 1 , wherein the first instruction further comprises a third operand specifying the second storage location. 3. The processor of claim 1 , wherein the instruction decoder further to decode a second instruction having a third operand specifying the second storage location, and a fourth operand specifying a second memory address, the second memory address being offset from the first memory address by the size of a single data element. 4. The processor of claim 3 , wherein the first instruction further comprises a prefix to indicate to the instruction decoder and execution unit that the second instruction follows the first instruction. 5. The processor of claim 3 , wherein the execution unit to predict that the second instruction to follow the first instruction. 6. The processor of claim 1 , wherein the first entry of the first storage location is not contiguous to the second entry of the second storage location, and wherein the second storage location is specified by the first operand. 7. The processor of claim 1 , wherein the first data element is stored in a third entry of a third storage location prior to being stored in the first entry of the first storage location, and the second data element is stored in a fourth entry of a fourth storage location prior to being stored in the second entry of the second storage location. 8. A method, comprising: decoding a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements; reading, in response to the decoded first instruction, a first and a second of the data elements that are contiguous from a memory location based on the first memory address indicated by the second operand; and storing the first data element in a first entry position of the first storage location and a second data element in a first entry position of a second storage location, wherein the entry positions are the same in the first and second storage locations and the first and second storage locations are different. 9. The method of claim 8 , wherein the first instruction further comprises a third operand specifying the second storage location. 10. The method of claim 8 , wherein the instruction decoder further decodes a second instruction having a third operand specifying the second storage location, and a fourth operand specifying a second memory address, the second memory address being offset from the first memory address by the size of a single data element. 11. The method of claim 10 , wherein the first instruction further comprises a prefix indicating to the instruction decoder and execution unit that the second instruction follows the first instruction. 12. The method of claim 10 , wherein the execution unit predicts that the second instruction follows the first instruction. 13. The method of claim 8 , wherein the first entry of the first storage location is not contiguous to the second entry of the second storage location, and wherein the second storage location is specified by the first operand. 14. The method of claim 8 , wherein the first data element is stored in a third entry of a third storage location prior to being stored in the first entry of the first storage location, and the second data element is stored in a fourth entry of a fourth storage location prior to being stored in the second entry of the second storage location. 15. A data processing system, comprising: an interconnect; a dynamic random access memory (DRAM) coupled to the interconnect; and a processor coupled the interconnect, including an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction to have a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements; an execution unit coupled to the instruction decoder, in response to a decoded first instruction, to read a first and a second of the data elements that are contiguous from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry position of the first storage location and a second data element in a first entry position of a second storage location, wherein the entry positions are the same in the first and second storage locations and the first and second storage locations are different. 16. The data processing system of claim 15 , wherein the first instruction further comprises a third operand specifying the second storage location. 17. The data processing system of claim 15 , wherein the instruction decoder further to decode a second instruction having a third operand specifying the second storage location, and a fourth operand specifying a second memory address, the second memory address being offset from the first memory address by the size of a single data element. 18. The data processing system of claim 17 , wherein the first instruction further comprises a prefix to indicate to the instruction decoder and execution unit that the second instruction follows the first instruction. 19. The data processing system of claim 17 , wherein the execution unit to predict that the second instruction follows the first instruction. 20. The data processing system of claim 15 , wherein the first entry of the first storage location is not contiguous to the second entry of the second storage location, and wherein the second storage location is specified by the first operand. 21. The data processing system of claim 15 , wherein the first data element is stored in a third entry of a third storage location prior to being stored in the first entry of the first storage location, and the second data element is stored in a fourth entry of a fourth storage location prior to being stored in the second entry of the second storage location.
Instruction operation extension or modification · CPC title
LOAD or STORE instructions; Clear instruction · CPC title
Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title
with dedicated cache, e.g. instruction or stack · CPC title
using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.