Apparatuses and methods for a processor architecture
US-2018165199-A1 · Jun 14, 2018 · US
US10896044B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10896044-B2 |
| Application number | US-201816014715-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 21, 2018 |
| Priority date | Jun 21, 2018 |
| Publication date | Jan 19, 2021 |
| Grant date | Jan 19, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The techniques described herein provide an instruction fetch and decode unit having an operation cache with low latency in switching between fetching decoded operations from the operation cache and fetching and decoding instructions using a decode unit. This low latency is accomplished through a synchronization mechanism that allows work to flow through both the operation cache path and the instruction cache path until that work is stopped due to needing to wait on output from the opposite path. The existence of decoupling buffers in the operation cache path and the instruction cache path allows work to be held until that work is cleared to proceed. Other improvements, such as a specially configured operation cache tag array that allows for detection of multiple hits in a single cycle, also improve latency by, for example, improving the speed at which entries are consumed from a prediction queue that stores predicted address blocks.
Opening claim text (preview).
What is claimed is: 1. A method for converting instruction addresses of a first predicted address block into decoded micro-operations for output to an operations queue that stores decoded micro-operations in program order, and for subsequent execution by a remainder of an instruction execution pipeline, the method comprising: providing an index associated with the first predicted address block to an operation cache tag array to obtain a first tag associated with a first operation cache tag array entry and a second tag associated with a second operation cache tag array entry; in a first computer clock cycle, determining that both the first tag and the second tag match a tag derived from the predicted address block and that an end address associated with the first operation cache tag array entry matches a start address associated with the second operation cache tag array entry; in the first computer clock cycle, identifying a set of instructions for which decoded micro-operations are stored in an operation cache data array of an operation cache path, as instructions at addresses associated with both the first operation cache tag array entry and the second operation cache tag array entry; storing a first operation cache queue entry for the set of instructions in an operation cache queue, the first operation cache queue entry including an indication indicating whether to wait to receive a signal from an instruction cache path to proceed; obtaining decoded micro-operations corresponding to the first operation cache queue entry from the operation cache; and outputting the decoded micro-operations corresponding to the first operation cache queue entry to the operations queue, at a time that is based on the indication of the first operation cache queue entry indicating whether to wait to receive the signal from the instruction cache path to proceed. 2. The method of claim 1 , wherein: the indication indicates a need to wait for prior micro-operations from the instruction cache path to be written to or in-flight to the operations queue; and outputting the decoded micro-operations to the operations queue includes waiting until the prior micro-operations from the instruction cache path are written to or in-flight to the operations queue before outputting the decoded micro-operations corresponding to the first operation cache queue to the operations queue. 3. The method of claim 1 , wherein: the indication does not indicate a need to wait for prior micro-operations from the instruction cache path to be written to or in-flight to the operations queue; and outputting the decoded micro-operations to the operations queue includes outputting the decoded micro-operations corresponding to the first operation cache queue entry without waiting for prior micro-operations from the instruction path. 4. The method of claim 1 , further comprising: converting instruction addresses of a second predicted address block into decoded micro-operations for output to the operations queue and for subsequent execution by the remainder of an instruction execution pipeline, by: identifying that the second predicted address block includes at least one instruction for which decoded micro-operations are not stored in the operation cache; storing an instruction cache queue entry in an instruction cache queue; obtaining instruction bytes for the instruction cache queue entry in an instruction byte buffer, along with an indication indicating whether to wait for prior operations from the operation cache to be written to or in flight to the operations cache; decoding the instruction bytes to obtain decoded micro-operations corresponding to the instruction byte buffer entry, at a time that is based on the indication indicating whether to wait for the prior operations from the operations cache path; and outputting the decoded micro-operations corresponding to the instruction byte buffer entry to the operations queue for storage. 5. The method of claim 4 , wherein: the indication indicating whether to wait for the prior operations from the operations cache path indicates a need to wait for the prior operations from the operations cache path; and decoding the instruction bytes comprises decoding the instruction bytes after the prior operations from the operations cache path are written to or in flight to the operations queue. 6. The method of claim 4 , wherein: the indication indicating whether to wait for the prior operations from the operations cache path indicates no need to wait for the prior operations from the operations cache path; and decoding the instruction bytes comprises decoding the instruction bytes without waiting for prior operations from the operations cache path to be written to or in flight to the operations queue. 7. The method of claim 1 , wherein: the first predicted address block includes at least one instruction for which decoded micro-operations are stored in an operation cache and at least one instruction for which decoded operations are not stored in the operation cache. 8. The method of claim 1 , further comprising: executing the micro-operations stored in the operations queue. 9. An instruction fetch and decode unit for converting instruction addresses of a first predicted address block into decoded micro-operations for output to an operations queue that stores decoded micro-operations in program order, and for subsequent execution by a remainder of an instruction execution pipeline, the instruction fetch and decode unit comprising: a shared fetch logic configured to: provide an index associated with the first predicted address block to an operation cache tag array to obtain a first tag associated with a first operation cache tag array entry and a second tag associated with a second operation cache tag array entry, in a first computer clock cycle, determine that both the first tag and the second tag match a tag derived from the predicted address block and that an end address associated with the first operation cache tag array entry matches a start address associated with the second operation cache tag array entry, and in the first computer clock cycle, identify a set of instructions for which decoded micro-operations are stored in an operation cache data array of an operation cache path, as instructions at addresses associated with both the first operation cache tag array entry and the second operation cache tag array entry; an operation cache queue configured to store a first operation cache queue entry for the set of instructions, the first operation cache queue entry including an indication indicating whether to wait to receive a signal from an instruction cache path to proceed; and an operation cache data read logic configured to obtain decoded micro-operations corresponding to the first operation cache queue entry from the operation cache, and to output the decoded micro-operations corresponding to the first operation cache queue entry to the operations queue, at a time that is based on the indication of the first operation cache queue entry indicating whether to wait to receive the signal from the instruction cache path to proceed. 10. The instruction fetch and decode unit of claim 9 , wherein: the indication indicates a need to wait for prior micro-operations from the instruction cache path to be written to or in-flight to the operations queue; and outputting the decoded micro-operations to the operations queue includes waiting until the prior micro-operations from the instruction cache path are written to or in-flight to the operations queue before outputting the decoded micro-operations corresponding to the first operation cache queue to the operations queue. 11. The instr
using decoder, e.g. decoder per instruction set, adaptable or programmable decoders · CPC title
Prefetch instructions; cache control instructions · CPC title
using dynamic branch prediction, e.g. using branch history tables · CPC title
Implementation provisions of instruction buffers, e.g. prefetch buffer; banks · CPC title
Pipelined decoding, e.g. using predecoding · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.