Streaming engine with separately selectable element and group duplication
US-11860790-B2 · Jan 2, 2024 · US
US10338927B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10338927-B2 |
| Application number | US-201715477374-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 3, 2017 |
| Priority date | Mar 28, 2014 |
| Publication date | Jul 2, 2019 |
| Grant date | Jul 2, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A hardware/software co-design for an optimized dynamic out-of-order Very Long Instruction Word (VLIW) pipeline. For example, one embodiment of an apparatus comprises: an instruction fetch unit to fetch Very Long Instruction Words (VLIWs) in their program order from memory, each of the VLIWs comprising a plurality of reduced instruction set computing (RISC) instruction syllables grouped into the VLIWs in an order which removes data-flow dependencies and false output dependencies between the syllables; a decode unit to decode the VLIWs in their program order and output the syllables of each decoded VLIW in parallel; and an out-of-order execution engine to execute the syllables preferably in parallel with other syllables, wherein at least some of the syllables are to be executed in a different order than the order in which they are received from the decode unit, the out-of-order execution engine having one or more processing stages which do not check for data-flow dependencies and false output dependencies between the syllables when performing operations.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: an instruction fetch unit to fetch Very Long Instruction Words (VLIWs) in program order from memory, each of the VLIWs comprising a plurality of reduced instruction set computing (RISC) instruction syllables grouped into the VLIWs in an order which removes data-flow dependencies and false output dependencies between the syllables, and wherein the plurality of RISC instruction syllables in the VLIWs include one or more false anti-dependencies; a decode unit to decode the VLIWs in program order and output the syllables of each decoded VLIW in parallel; and an out-of-order execution engine to execute at least some of the syllables in parallel with other syllables, wherein at least some of the syllables are to be executed in a different order than the order in which they are received from the decode unit. 2. The apparatus of claim 1 , wherein the out-of-order execution engine includes register renaming logic to map the VLIWs onto physical registers. 3. The apparatus of claim 2 , wherein the register renaming logic is to implement a write phase later to a read phase for reading logical register operands to remove the one or more false anti-dependencies. 4. The apparatus of claim 2 , wherein the out-of-order execution engine further comprises scheduler setup logic to evaluate dependencies between syllables prior to scheduling of the syllables for execution by functional units, the schedule setup logic to perform in parallel with a read phase of the register renaming logic. 5. The apparatus of claim 4 , wherein the scheduler setup logic implements a logic write phase later to a logic read phase to remove the one or more false anti-dependencies. 6. The apparatus as in claim 5 , wherein the scheduler setup logic is to further operate on each syllable in parallel with cancellation setup logic usable by the out-of-order execution engine to cancel effects of certain dispatched syllables. 7. The apparatus as in claim 1 , further comprising: a translator to translate program code from a high-level programming language or a public instruction set architecture (ISA) format to a private ISA format comprising the VLIWs and syllables. 8. The apparatus as in claim 7 , wherein the translator comprises an optimizing compiler or binary translator including a dynamic binary translator. 9. The apparatus as in claim 7 , wherein the translator resolves data-flow dependencies and false output dependencies when translating to the private ISA format such that the syllables contained within each of VLIWs fetched in-order from memory do not have data-flow dependencies and false output dependencies. 10. The apparatus as in claim 9 , wherein the data-flow dependencies comprise read-after-write (“R-A-W”) dependencies and the false output dependencies comprise write-after-write (“W-A-W”) dependencies. 11. The apparatus as in claim 7 , wherein the public ISA comprise the Intel Architecture (IA). 12. The apparatus as in claim 1 , wherein the false anti-dependencies comprise write-after-read (“W-A-R”) dependencies. 13. The apparatus as in claim 1 , wherein the syllables are of multiple types including any combination of one or more control syllables, one or more floating-point vector syllables, one or more memory syllables, and/or one or more integer ALU syllables, where each syllable may be represented by a RISC instruction of a correspondent type. 14. The apparatus as in claim 13 , wherein the syllable type is defined the allowed relative position of a syllable in a VLIW. 15. The apparatus as in claim 1 , wherein the out-of-order execution engine includes dispatch logic to perform non-speculative early dispatch of syllables. 16. The apparatus as in claim 1 , wherein the out-of-order execution engine is fully partitioned, including a register rename/allocation unit having N partitions and a scheduler unit having N partitions. 17. The apparatus as in claim 16 , wherein the partitions are physically arranged to handle certain types of instructions. 18. The apparatus as in claim 17 , wherein a first partition in the scheduler unit is associated with a first type of execution unit and a second partition in the scheduler unit is associated with a second type of execution unit. 19. The apparatus as in claim 16 , wherein the partitioning of the rename/allocation unit and the scheduler unit reduces the number of write ports in the out-of-order execution engine and/or memory ordering buffer, including its load and store buffers. 20. A method comprising: fetching Very Long Instruction Words (VLIWs) in program order from memory, each of the VLIWs comprising a plurality of reduced instruction set computing (RISC) instruction syllables grouped into the VLIWs in an order which removes data-flow dependencies and false output dependencies between the syllables, and wherein the plurality of RISC instruction syllables in the VLIWs include one or more false anti-dependencies; decoding the VLIWs in program order and output the syllables of each decoded VLIW in parallel; and executing at least some of the syllables in parallel with other syllables, wherein at least some of the syllables are to be executed in a different order than the order in which they are received from the decoding. 21. The method as in claim 20 , further comprises: translating program code from a high-level programming language or a public instruction set architecture (ISA) format to a private ISA format comprising the VLIWs and syllables. 22. The method as in claim 21 , wherein the public ISA comprise the Intel Architecture (IA). 23. The method as in claim 21 , wherein the translating comprises resolving data-flow dependencies and false output dependencies when translating to the private ISA format such that the syllables contained within each of VLIWs fetched in-order from memory do not have data-flow dependencies and false output dependencies. 24. The method as in claim 23 , wherein the data-flow dependencies comprise read-after-write (“R-A-W”) dependencies and the false output dependencies comprise write-after-write (“W-A-W”) dependencies. 25. The method as in claim 20 , wherein the false anti-dependencies comprise write-after-read (“W-A-R”) dependencies. 26. The method as in claim 20 , wherein the syllables are of multiple types including any combination of one or more control syllables, one or more floating-point vector syllables, one or more memory syllables, and/or one or more integer ALU syllables, where each syllable may be represented by a RISC instruction of a correspondent type. 27. The method as in claim 26 , wherein the syllable type is defined the allowed relative position of a syllable in a VLIW.
Instruction analysis, e.g. decoding, instruction word fields · CPC title
Physics · mapped topic
Multiprogramming arrangements · CPC title
organised in groups of units sharing resources, e.g. clusters · CPC title
of compound instructions · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.