Method and apparatus for implementing a dynamic out-of-order processor pipeline

US10338927B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10338927-B2
Application numberUS-201715477374-A
CountryUS
Kind codeB2
Filing dateApr 3, 2017
Priority dateMar 28, 2014
Publication dateJul 2, 2019
Grant dateJul 2, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A hardware/software co-design for an optimized dynamic out-of-order Very Long Instruction Word (VLIW) pipeline. For example, one embodiment of an apparatus comprises: an instruction fetch unit to fetch Very Long Instruction Words (VLIWs) in their program order from memory, each of the VLIWs comprising a plurality of reduced instruction set computing (RISC) instruction syllables grouped into the VLIWs in an order which removes data-flow dependencies and false output dependencies between the syllables; a decode unit to decode the VLIWs in their program order and output the syllables of each decoded VLIW in parallel; and an out-of-order execution engine to execute the syllables preferably in parallel with other syllables, wherein at least some of the syllables are to be executed in a different order than the order in which they are received from the decode unit, the out-of-order execution engine having one or more processing stages which do not check for data-flow dependencies and false output dependencies between the syllables when performing operations.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: an instruction fetch unit to fetch Very Long Instruction Words (VLIWs) in program order from memory, each of the VLIWs comprising a plurality of reduced instruction set computing (RISC) instruction syllables grouped into the VLIWs in an order which removes data-flow dependencies and false output dependencies between the syllables, and wherein the plurality of RISC instruction syllables in the VLIWs include one or more false anti-dependencies; a decode unit to decode the VLIWs in program order and output the syllables of each decoded VLIW in parallel; and an out-of-order execution engine to execute at least some of the syllables in parallel with other syllables, wherein at least some of the syllables are to be executed in a different order than the order in which they are received from the decode unit. 2. The apparatus of claim 1 , wherein the out-of-order execution engine includes register renaming logic to map the VLIWs onto physical registers. 3. The apparatus of claim 2 , wherein the register renaming logic is to implement a write phase later to a read phase for reading logical register operands to remove the one or more false anti-dependencies. 4. The apparatus of claim 2 , wherein the out-of-order execution engine further comprises scheduler setup logic to evaluate dependencies between syllables prior to scheduling of the syllables for execution by functional units, the schedule setup logic to perform in parallel with a read phase of the register renaming logic. 5. The apparatus of claim 4 , wherein the scheduler setup logic implements a logic write phase later to a logic read phase to remove the one or more false anti-dependencies. 6. The apparatus as in claim 5 , wherein the scheduler setup logic is to further operate on each syllable in parallel with cancellation setup logic usable by the out-of-order execution engine to cancel effects of certain dispatched syllables. 7. The apparatus as in claim 1 , further comprising: a translator to translate program code from a high-level programming language or a public instruction set architecture (ISA) format to a private ISA format comprising the VLIWs and syllables. 8. The apparatus as in claim 7 , wherein the translator comprises an optimizing compiler or binary translator including a dynamic binary translator. 9. The apparatus as in claim 7 , wherein the translator resolves data-flow dependencies and false output dependencies when translating to the private ISA format such that the syllables contained within each of VLIWs fetched in-order from memory do not have data-flow dependencies and false output dependencies. 10. The apparatus as in claim 9 , wherein the data-flow dependencies comprise read-after-write (“R-A-W”) dependencies and the false output dependencies comprise write-after-write (“W-A-W”) dependencies. 11. The apparatus as in claim 7 , wherein the public ISA comprise the Intel Architecture (IA). 12. The apparatus as in claim 1 , wherein the false anti-dependencies comprise write-after-read (“W-A-R”) dependencies. 13. The apparatus as in claim 1 , wherein the syllables are of multiple types including any combination of one or more control syllables, one or more floating-point vector syllables, one or more memory syllables, and/or one or more integer ALU syllables, where each syllable may be represented by a RISC instruction of a correspondent type. 14. The apparatus as in claim 13 , wherein the syllable type is defined the allowed relative position of a syllable in a VLIW. 15. The apparatus as in claim 1 , wherein the out-of-order execution engine includes dispatch logic to perform non-speculative early dispatch of syllables. 16. The apparatus as in claim 1 , wherein the out-of-order execution engine is fully partitioned, including a register rename/allocation unit having N partitions and a scheduler unit having N partitions. 17. The apparatus as in claim 16 , wherein the partitions are physically arranged to handle certain types of instructions. 18. The apparatus as in claim 17 , wherein a first partition in the scheduler unit is associated with a first type of execution unit and a second partition in the scheduler unit is associated with a second type of execution unit. 19. The apparatus as in claim 16 , wherein the partitioning of the rename/allocation unit and the scheduler unit reduces the number of write ports in the out-of-order execution engine and/or memory ordering buffer, including its load and store buffers. 20. A method comprising: fetching Very Long Instruction Words (VLIWs) in program order from memory, each of the VLIWs comprising a plurality of reduced instruction set computing (RISC) instruction syllables grouped into the VLIWs in an order which removes data-flow dependencies and false output dependencies between the syllables, and wherein the plurality of RISC instruction syllables in the VLIWs include one or more false anti-dependencies; decoding the VLIWs in program order and output the syllables of each decoded VLIW in parallel; and executing at least some of the syllables in parallel with other syllables, wherein at least some of the syllables are to be executed in a different order than the order in which they are received from the decoding. 21. The method as in claim 20 , further comprises: translating program code from a high-level programming language or a public instruction set architecture (ISA) format to a private ISA format comprising the VLIWs and syllables. 22. The method as in claim 21 , wherein the public ISA comprise the Intel Architecture (IA). 23. The method as in claim 21 , wherein the translating comprises resolving data-flow dependencies and false output dependencies when translating to the private ISA format such that the syllables contained within each of VLIWs fetched in-order from memory do not have data-flow dependencies and false output dependencies. 24. The method as in claim 23 , wherein the data-flow dependencies comprise read-after-write (“R-A-W”) dependencies and the false output dependencies comprise write-after-write (“W-A-W”) dependencies. 25. The method as in claim 20 , wherein the false anti-dependencies comprise write-after-read (“W-A-R”) dependencies. 26. The method as in claim 20 , wherein the syllables are of multiple types including any combination of one or more control syllables, one or more floating-point vector syllables, one or more memory syllables, and/or one or more integer ALU syllables, where each syllable may be represented by a RISC instruction of a correspondent type. 27. The method as in claim 26 , wherein the syllable type is defined the allowed relative position of a syllable in a VLIW.

Assignees

Inventors

Classifications

  • Instruction analysis, e.g. decoding, instruction word fields · CPC title

  • Physics · mapped topic

  • Multiprogramming arrangements · CPC title

  • organised in groups of units sharing resources, e.g. clusters · CPC title

  • G06F9/3853Primary

    of compound instructions · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10338927B2 cover?
A hardware/software co-design for an optimized dynamic out-of-order Very Long Instruction Word (VLIW) pipeline. For example, one embodiment of an apparatus comprises: an instruction fetch unit to fetch Very Long Instruction Words (VLIWs) in their program order from memory, each of the VLIWs comprising a plurality of reduced instruction set computing (RISC) instruction syllables grouped into the…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/3853. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 02 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).