Out-of-order block-based processors and instruction schedulers using ready state data indexed by instruction position identifiers

US11687345B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11687345-B2
Application numberUS-201615224469-A
CountryUS
Kind codeB2
Filing dateJul 29, 2016
Priority dateApr 28, 2016
Publication dateJun 27, 2023
Grant dateJun 27, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Apparatus and methods are disclosed for implementing block-based processors including field programmable gate-array implementations. In one example of the disclosed technology, a block-based processor includes an instruction decoder configured to generate decoded ready dependencies for a transactional block of instructions, where each of the instructions is associated with a different instruction identifier encoded in the transactional block. The processor further includes an instruction scheduler configured to issue an instruction from a set of instructions of the transactional block of instructions. The instruction is issued based on determining that decoded ready state dependencies for an instruction are satisfied. The determining includes accessing storage with the decoded ready dependencies indexed with a respective instruction identifier that is encoded in the transactional block of instructions.

First claim

Opening claim text (preview).

We claim: 1. An apparatus comprising: an instruction decoder configured to generate decoded ready dependencies for at least a portion of a group of instructions fetched from a memory, each instruction of the group of instructions being associated with a different respective instruction identifier indicating that instruction's relative position within the group of instructions in the memory; and an instruction scheduler configured to issue a first instruction from the group of instructions out of program order, wherein: the first instruction is issued based on determining that the decoded ready dependencies for the first instruction are satisfied, and the determining comprises accessing storage storing the decoded ready dependencies using the first instruction's respective instruction identifier encoded within one instruction of the group of instructions, wherein the determining comprises using an instruction identifier encoded within and signaled by an executed instruction to generate an index used to access the storage, wherein the index is generated in response to execution of the executed instruction. 2. The apparatus of claim 1 , wherein the apparatus further comprises an instruction fetch unit, the instruction fetch unit being configured to fetch at least a portion of a header for the group of instructions and to fetch at least a portion of an instruction of the group of instructions concurrently. 3. The apparatus of claim 2 , wherein the instruction fetch unit comprises a first block memory that stores at least the fetched portion of the header and a second block memory that stores at least the fetched portion of the instruction. 4. The apparatus of claim 1 , wherein the apparatus is a soft core processor implemented with a configurable logic device. 5. The apparatus of claim 1 , wherein the apparatus is configured to select a next instruction of the group of instructions to execute with a priority encoder and based on the instruction identifier encoded for the next instruction. 6. The apparatus of claim 1 , wherein the instruction scheduler is coupled to a data operand buffer, the data operand buffer storing data generated for execution by the instructions in a subsequent clock cycle. 7. The apparatus of claim 6 , further comprising a bypass logic circuit that allows data operands to be forwarded for execution by an instruction in the immediately subsequent clock cycle, the bypass logic circuit allowing the data operands to be forwarded without storing the data operands in the data operand buffer. 8. The apparatus of claim 6 , wherein the data operand buffer is configured to store operand data for not more than one instruction per clock cycle, the apparatus further comprising: a bypass logic circuit that allows a data operand for a different instruction to be forwarded to an execution unit in the same clock cycle as a different data operand is stored in the data operand buffer. 9. A field programmable gate array (FPGA) comprising: a processor, comprising: a first memory implemented using a plurality of multi-input lookup-tables (LUTs) in the FPGA; an instruction cache configured to receive instructions fetched from a second memory, a given instruction of the received instructions comprising first and second instruction identifiers encoded in the given instruction, the first instruction identifier indicating the given instruction's relative position within a group of instructions in the second memory, and the second instruction identifier designating a target instruction other than the given instruction, to receive a result generated by executing the given instruction; and an instruction scheduler configured to store ready state data in the first memory indexed by an instruction identifier of a corresponding instruction, the stored ready state data indicating state of the corresponding instruction's predicate operands and/or data operands, the instruction scheduler being further configured to issue the corresponding instruction when the stored ready state data indicates that all operand dependencies for the corresponding instruction are satisfied. 10. The FPGA of claim 9 , wherein the instruction cache is implemented with block random access memory (RAM) resources of the FPGA. 11. The FPGA of claim 9 , wherein the instruction scheduler is implemented with random access memory (RAM) formed using a portion of the plurality of LUTs. 12. The FPGA of claim 11 , wherein the LUTs are formed from static random access memory (RAM) cells coupled to one or more multiplexers. 13. The FPGA of claim 9 , wherein the instruction scheduler is coupled to: a decoded instruction word memory configured to store decoded instruction control data for at least a portion of the received instructions; and a plurality of operand buffers configured to store operand data for executing the received instructions. 14. The FPGA of claim 9 , wherein the FPGA is further configured to execute a subsequent instance of the given instruction by refreshing and re-executing the given instruction, and wherein the ready state data comprises decoded ready state information, which is not cleared upon the refreshing, and active ready state data that is cleared upon the refreshing. 15. The FPGA of claim 9 , wherein the instruction scheduler is configured to reuse at least a portion of the stored ready state data for a second instruction distinct from the given instruction for a subsequent instance of executing the received instructions, and wherein the FPGA is configured to not re-fetch and to not re-decode the second instruction for executing the subsequent instance. 16. The FPGA of claim 9 , wherein the instruction scheduler is configured to determine that all of an instruction's dependencies are satisfied by comparing the stored ready state data to one or more signals generated by executing another instruction. 17. The FPGA of claim 9 , wherein at least one of the received instructions is encoded with an instruction identifier that indicates a target instruction that receives a result generated by executing the at least one of the received instructions. 18. A method of forming a processor with configurable logic devices, the method comprising: producing a configuration bitstream comprising configuration information for implementing a circuit for the processor with the configurable logic devices, the circuit for the processor comprising: an out-of-order instruction scheduler configured to issue a target instruction based on operand ready state data stored in a memory indexed by an instruction identifier designating the target instruction that receives a result consumed by the target instruction when a source instruction that produces at least one operand indicated by the operand ready state data is executed, the instruction identifier being encoded in the source instruction, the out-of-order instruction scheduler being further configured to use an instruction identifier from an executed instruction to generate an index for accessing ready dependencies of another instruction, wherein the index is generated in response to pipelined execution of the source instruction. 19. The method of claim 18 , further comprising mapping a description of at least one or more of the following to block random access memory (RAM) hardware implemented with the configurable logic devices: an instruction cache or a data cache. 20. The method of claim 18 , further comprising mapping a hardware description language specification to a netlist, the netlist comprising a descri

Assignees

Inventors

Classifications

  • to perform operations for flow control · CPC title

  • Result writeback, i.e. updating the architectural state or memory · CPC title

  • for indirect branch instructions · CPC title

  • Instruction completion, e.g. retiring, committing or graduating · CPC title

  • using a plurality of independent parallel functional units · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11687345B2 cover?
Apparatus and methods are disclosed for implementing block-based processors including field programmable gate-array implementations. In one example of the disclosed technology, a block-based processor includes an instruction decoder configured to generate decoded ready dependencies for a transactional block of instructions, where each of the instructions is associated with a different instructi…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F9/3836. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 27 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).