Streaming engine with separately selectable element and group duplication
US-11860790-B2 · Jan 2, 2024 · US
US9348592B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9348592-B2 |
| Application number | US-201113995793-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 22, 2011 |
| Priority date | Dec 22, 2011 |
| Publication date | May 24, 2016 |
| Grant date | May 24, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus and method are described for fetching and storing a plurality of portions of a data stream into a plurality of registers. For example, a method according to one embodiment includes the following operations: determining a set of N vector registers into which to read N designated portions of a data stream stored in system memory; determining the system memory addresses for each of the N designated portions of the data stream; fetching the N designated portions of the data stream from the system memory at the system memory addresses; and storing the N designated portions of the data stream into the N vector registers.
Opening claim text (preview).
I claim: 1. A processor to execute an instruction to perform the operations of: determining a set of N vector registers into which to read N designated portions of a data stream stored in system memory; determining system memory addresses for each of the N designated portions of the data stream; fetching the N designated portions of the data stream from the system memory at the system memory addresses, wherein the N designated portions of the data stream include overlapping portions within the data stream; and storing the N designated portions of the data stream into the N vector registers, wherein the instruction is a single instruction. 2. The processor as in claim 1 wherein determining the system memory addresses comprises directly determining a first system memory address from the instruction and calculating the remaining N−1 addresses by adding multiples of a slide value to the first system memory address. 3. The processor as in claim 2 wherein the slide value is set to be equal to a size of a data element of the data stream. 4. The processor as in claim 1 wherein the portions of the data stream comprise data elements of the data stream. 5. The processor as in claim 1 wherein the instruction is specified in the form INSTRUCTION REG1, COUNT, MEMLOCATION, where REG1 comprises a first vector register to store a first portion of a data stream, COUNT comprises the number of portions of the data stream to be fetched from the system memory, and MEMLOCATION comprises the memory location for the first portion of the data stream. 6. The processor as in claim 5 wherein COUNT is set to a value of 16 for 16 portions of the data stream. 7. The processor as in claim 1 wherein each of the N portions of the data stream comprise floating point values and wherein each of the N vector registers comprise floating point registers. 8. The processor as in claim 7 wherein each of the floating point values comprise scalar floating point values. 9. The processor as in claim 7 wherein each of the floating point values comprise double floating point values. 10. The processor as in claim 1 wherein each of the N portions of the data stream comprise integer values. 11. The processor as in claim 10 wherein each of the integer values comprise packed doubleword values. 12. The processor as in claim 10 wherein each of the integer values comprise packed quadword values. 13. A method comprising: determining a set of N vector registers into which to read N designated portions of a data stream stored in system memory; determining system memory addresses for each of the N designated portions of the data stream; fetching the N designated portions of the data stream from the system memory at the system memory addresses, wherein the N designated portions of the data stream include overlapping portions within the data stream; and storing the N designated portions of the data stream into the N vector registers, wherein the method is performed through executing a single instruction by a processor. 14. The method as in claim 13 wherein determining the system memory addresses comprises directly determining a first system memory address from the single instruction and calculating the remaining N−1 addresses by adding multiples of a slide value to the first system memory address. 15. The method as in claim 14 wherein the slide value is set to be equal to a size of a data element of the data stream. 16. The method as in claim 13 wherein the portions of the data stream comprise data elements of the data stream. 17. The method as in claim 13 wherein the single instruction is specified in the form INSTRUCTION REG1, COUNT, MEMLOCATION, where REG1 comprises a first vector register to store a first portion of a data stream, COUNT comprises the number of portions of the data stream to be fetched from the system memory, and MEMLOCATION comprises the memory location for the first portion of the data stream. 18. The method as in claim 17 wherein COUNT is set to a value of 16 for 16 portions of the data stream. 19. The method as in claim 13 wherein each of the N portions of the data stream comprise floating point values and wherein each of the N vector registers comprise floating point registers. 20. The method as in claim 19 wherein each of the floating point values comprise scalar floating point values. 21. The method as in claim 19 wherein each of the floating point values comprise double floating point values. 22. The method as in claim 13 wherein each of the N portions of the data stream comprise integer values. 23. The method as in claim 22 wherein each of the integer values comprise packed doubleword values. 24. The method as in claim 22 wherein each of the integer values comprise packed quadword values. 25. A computer system comprising: a memory for storing program instructions and data; a processor to execute a single program instruction to perform the operations of: determining a set of N vector registers into which to read N designated portions of a data stream stored in system memory; determining system memory addresses for each of the N designated portions of the data stream; fetching the N designated portions of the data stream from the system memory at the system memory addresses, wherein the N designated portions of the data stream include overlapping portions within the data stream; and storing the N designated portions of the data stream into the N vector registers. 26. The system as in claim 25 further comprising: a display adapter to render graphics images in response to execution of the single program instruction by the processor. 27. The system as in claim 26 further comprising: a user input interface to receive control signals from a user input device, the processor executing the single program instruction in response to the control signals. 28. A processor to execute an instruction comprising: means for determining a set of N vector registers into which to read N designated portions of a data stream stored in system memory; means for determining system memory addresses for each of the N designated portions of the data stream; means for fetching the N designated portions of the data stream from the system memory at the system memory addresses, wherein the N designated portions of the data stream include overlapping portions within the data stream; and means for storing the N designated portions of the data stream into the N vector registers, wherein the instruction is a single instruction. 29. The processor as in claim 28 wherein means for determining the system memory addresses comprises directly determining a first system memory address from the instruction and calculating the remaining N−1 addresses by adding multiples of a slide value to the first system memory address. 30. The processor as in claim 29 wherein the slide value is set to be equal to a size of a data element of the data stream.
with variable precision · CPC title
Adapting program code to run in a different environment; Porting · CPC title
according to one or more bits in the instruction, e.g. prefix, sub-opcode · CPC title
Bit or string instructions · CPC title
Operand prefetching (cache prefetching G06F12/0862) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.