Branch target buffer arrangement for instruction prefetching
US-2021004233-A1 · Jan 7, 2021 · US
US11403103B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11403103-B2 |
| Application number | US-202017069217-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 13, 2020 |
| Priority date | Apr 14, 2020 |
| Publication date | Aug 2, 2022 |
| Grant date | Aug 2, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A microprocessor is shown, in which a branch predictor and an instruction cache are decoupled by a fetch-target queue (FTQ). The branch predictor performs branch prediction for N instruction addresses in parallel in the same cycle, wherein N is an integer greater than 1. In the current cycle, the branch predictor finishes branch prediction for N instruction addresses in parallel and, among the N instruction addresses with finished branch prediction, those that are not bypassed and do not overlap previously-predicted instruction addresses are pushed into the fetch-target queue, to be read out later as an instruction-fetching address for the instruction cache. The previously-predicted instruction addresses are pushed into the fetch-target queue in a previous cycle.
Opening claim text (preview).
What is claimed is: 1. A microprocessor, comprising: an instruction cache, operated according to an instruction-fetching address for instruction fetching; a branch predictor, performing branch prediction for N instruction addresses in parallel in the same time, wherein N is an integer greater than 1; and a fetch-target queue, coupled between the branch predictor and the instruction cache, wherein: in a current cycle, the branch predictor finishes branch prediction for the N instruction addresses in parallel and, among the N instruction addresses with finished branch prediction, those that are not bypassed and which do not overlap previously-predicted instruction addresses are pushed into the fetch-target queue, to be read out later as the instruction-fetching address for the instruction cache; the previously-predicted instruction addresses are pushed into the fetch-target queue in a previous cycle; in the current cycle, when the branch predictor predicts that in N chunks indicated by the N instruction addresses with finished branch prediction, a branch is predicted to be taken, and the taken branch is called by a branch instruction across two adjacent chunks, an instruction address indicating a second chunk next to a first chunk corresponding to the branch instruction calling the taken branch is pushed into the fetch-target queue, to be read out later as the instruction-fetching address for the instruction cache; N is 3; each chunk indicated by each instruction address is M bytes, M is a number; in the current cycle, the branch predictor finishes branch prediction for instruction addresses PC, PC+M, and PC+2*M; in a first setting, there is one overlapping instruction address between branch prediction finished in the current cycle and branch prediction finished in the previous cycle; in the first setting, when the instruction address PC is not bypassed and has not been pushed into the fetch-target queue in the previous cycle: the fetch-target queue provides a first entry to store the instruction address PC; the fetch-target queue provides a second entry to store the instruction address PC+M when no branch is predicted to be taken in a chunk indicated by the instruction address PC, or when a branch is predicted to be taken in the chunk indicated by the instruction address PC and the taken branch is called by a branch instruction across two adjacent chunks; the fetch-target queue provides a third entry to store the instruction address PC+2*M when no branch is predicted to be taken in two chunks indicated by the instruction addresses PC and PC+M, or when no branch is predicted to be taken in the chunk indicated by the instruction address PC, a branch is predicted to be taken in the chunk indicated by the instruction address PC+M, and the taken branch is called by a branch instruction across two adjacent chunks; and the fetch-target queue provides a fourth entry to store an instruction address PC+3*M when no branch is predicted to be taken in the two chunks indicated by the instruction addresses PC and PC+M, a branch is predicted to be taken in a chunk indicated by the instruction address PC+2*M, and the taken branch is called by a branch instruction across two adjacent chunks. 2. The microprocessor as claimed in claim 1 , wherein: the branch predictor involves multiple stages of first pipeline operations; and the instruction cache involves multiple stages of second pipeline operations. 3. The microprocessor as claimed in claim 1 , wherein in the first setting, when the instruction addresses PC, PC+M are not bypassed and the instruction address PC has been pushed into the fetch-target queue in the previous cycle: the fetch-target queue provides a first entry to store the instruction address PC+M; the fetch-target queue provides a second entry to store the instruction address PC+2*M when no branch is predicted to be taken in a chunk indicated by the instruction address PC+M, or when a branch is predicted to be taken in the chunk indicated by the instruction address PC+M and the taken branch is called by a branch instruction across two adjacent chunks; the fetch-target queue provides a third entry to store an instruction address PC+3*M when no branch is predicted to be taken in the chunk indicated by the instruction address PC+M, a branch is predicted to be taken in a chunk indicated by the instruction address PC+2*M, and the taken branch is called by a branch instruction across two adjacent chunks. 4. The microprocessor as claimed in claim 1 , wherein: in a second setting, there are two overlapping instruction addresses between the branch prediction finished in the current cycle and the branch prediction finished in the previous cycle. 5. The microprocessor as claimed in claim 4 , wherein in the second setting, when the instruction address PC is not bypassed and has not been pushed into the fetch-target queue in the previous cycle: the fetch-target queue provides a first entry to store the instruction address PC; the fetch-target queue provides a second entry to store the instruction address PC+M when no branch is predicted to be taken in a chunk indicated by the instruction address PC, or when a branch is predicted to be taken in the chunk indicated by the instruction address PC and the taken branch is called by a branch instruction across two adjacent chunks; the fetch-target queue provides a third entry to store the instruction address PC+2*M when no branch is predicted to be taken in two chunks indicated by the instruction addresses PC and PC+M, or when no branch is predicted to be taken in the chunk indicated by the instruction address PC, a branch is predicted to be taken in the chunk indicated by the instruction address PC+M, and the taken branch is called by a branch instruction across two adjacent chunks; and the fetch-target queue provides a fourth entry to store an instruction address PC+3*M when no branch is predicted to be taken in the two chunks indicated by the instruction addresses PC and PC+M, a branch is predicted to be taken in a chunk indicated by the instruction address PC+2*M, and the taken branch is called by a branch instruction across two adjacent chunks. 6. The microprocessor as claimed in claim 4 , wherein in the second setting, when the instruction addresses PC, PC+M are not bypassed and have been pushed into the fetch-target queue in the previous cycle: the fetch-target queue provides a first entry to store the instruction address PC+2*M; the fetch-target queue provides a second entry to store an instruction address PC+3*M when a branch is predicted to be taken in the chunk indicated by the instruction address PC+2*M and the taken branch is called by a branch instruction across two adjacent chunks. 7. The microprocessor as claimed in claim 1 , wherein: in a third setting, there are no overlapping instruction addresses between the branch prediction finished in the current cycle and the branch prediction finished in the previous cycle. 8. The microprocessor as claimed in claim 7 , wherein in the third setting, when the instruction address PC is not bypassed: the fetch-target queue provides a first entry to store the instruction address PC; the fetch-target queue provides a second entry to store the instruction address PC+M when no branch is predicted to be taken in a chunk indicated by the instruction address PC, or when a branch is predicted to be taken in the chunk indicated by the instruction address PC and the taken branch is called by a branch instruction across two adjacent chunks; the fetch-target queue provides a third entry to store the instruction address PC+2*M when no branch is predicted to be taken in two chunks indicated by the instruction addresses PC and PC+M, or when no branch is pred
Recovery, e.g. branch miss-prediction, exception handling (error detection or correction G06F11/00) · CPC title
using a plurality of independent parallel functional units · CPC title
Prefetch instructions; cache control instructions · CPC title
with dedicated cache, e.g. instruction or stack · CPC title
for instruction reuse, e.g. trace cache, branch target cache · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.