Branch target look up suppression
US-11029959-B2 · Jun 8, 2021 · US
US11379243B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11379243-B2 |
| Application number | US-202017083652-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 29, 2020 |
| Priority date | Apr 7, 2020 |
| Publication date | Jul 5, 2022 |
| Grant date | Jul 5, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A microprocessor with a multistep-ahead branch predictor is shown. The branch predictor is coupled to an instruction cache and has an N-stage pipelined architecture, which is configured to perform branch prediction to control the instruction fetching of the instruction cache. The branch predictor performs branch prediction for (N−1) instruction-address blocks in parallel, wherein the (N−1) instruction-address blocks include a starting instruction-address block and (N−2) subsequent instruction-address blocks. The branch predictor is thereby ahead of branch prediction of the starting instruction-address block. The branch predictor stores reference information about branch prediction in at least one memory and performs a parallel search of the memory for the branch prediction of the (N-1) instruction-address blocks.
Opening claim text (preview).
What is claimed is: 1. A microprocessor, comprising: an instruction cache; and a branch predictor with an N-stage pipelined architecture, coupled to the instruction cache and configured to perform branch prediction to control instruction fetching of the instruction cache, where N is a natural number that makes (N−1) greater than one, wherein: the branch predictor performs branch prediction for (N−1) instruction-address blocks in parallel to process multiple instruction-address blocks simultaneously in each stage, wherein the (N−1) instruction-address blocks include a starting instruction-address block and (N−2) subsequent instruction-address blocks; and the branch predictor stores reference information about branch prediction in at least one memory and searches the memory to get reference information in parallel for the branch prediction of the (N−1) instruction-address blocks. 2. The microprocessor as claimed in claim 1 , wherein the branch predictor includes: a first memory, configured as a branch target address cache that outputs (N−1) pieces of branch destination information in parallel corresponding to the starting instruction-address block and the (N−2) subsequent instruction-address blocks; and (N−1) sets of branch logic circuits and a first multiplexer, wherein the (N−1) pieces of branch destination information are processed by the (N−1) sets of branch logic circuits and then sent to the first multiplexer, and a branch destination block is indicated by an output of the first multiplexer. 3. The microprocessor as claimed in claim 2 , wherein: the first memory includes at least (N−1) memory banks, and the different memory banks correspond to different instruction-address blocks identified by lower bits of instruction-address block address; and the (N−1) pieces of branch destination information are stored in the different memory banks to be accessed in parallel. 4. The microprocessor as claimed in claim 2 , wherein: the first memory includes (N−1) input ports to receive the starting instruction-address block and the (N−2) subsequent instruction-address blocks in parallel and thereby the (N−1) pieces of branch destination information are accessed in parallel. 5. The microprocessor as claimed in claim 2 , wherein the branch predictor includes: a second memory, storing a branch history table, wherein: the branch history table is searched by using a plurality of calculated results as indexes, wherein calculations are performed on the starting instruction-address block and the (N−2) subsequent instruction-address blocks respectively with a corresponding history pattern to generate the calculated results; and by searching the branch history table, branch directions corresponding to the starting instruction-address block and the (N−2) subsequent instruction-address blocks are provided to control the (N−1) sets of branch logic circuits. 6. The microprocessor as claimed in claim 5 , wherein the branch predictor further includes: a shift register, storing the corresponding history pattern; and the calculations involve a hash operation or a bitwise exclusive-or operation. 7. The microprocessor as claimed in claim 5 , wherein: when the corresponding history pattern is incomplete for a multistep-ahead instruction-address block of the starting instruction-address block and the (N−2) subsequent instruction-address blocks, the branch predictor provides a plurality of possible branch directions corresponding to the multistep-ahead instruction-address block in parallel based on a plurality of history pattern assumptions; and after the incomplete history pattern is made up, the branch predictor selects a matched branch direction from the plurality of possible branch directions matched with the made up history pattern. 8. The microprocessor as claimed in claim 7 , wherein: the history pattern assumptions consider a case wherein an earlier instruction-address block whose branch prediction is unfinished involves no branch and a case wherein the earlier instruction-address block involves a branch not to be taken. 9. The microprocessor as claimed in claim 7 , wherein: when the starting instruction-address block and the (N−2) subsequent instruction-address blocks overlap (N−1) instruction-address blocks processed in a previous round of branch prediction, overlapped instruction-address blocks are omitted from processing by the branch predictor again, and each non-overlapped instruction-address block is processed by the branch predictor for parallel branch prediction based on the plurality of history pattern assumptions. 10. The microprocessor as claimed in claim 9 , wherein: the second memory has 2 (N-2) input ports to receive 2 (N-2) instruction-address blocks in parallel. 11. The microprocessor as claimed in claim 1 , wherein: N is 4; and the branch predictor includes a first pipeline stage, a second pipeline stage, a third pipeline stage and a fourth pipeline stage. 12. The microprocessor as claimed in claim 11 , wherein: the branch predictor predicts a branch with a first instruction-address block as a destination, wherein subsequent to the first instruction-address block there are a second instruction-address block, a third instruction-address block and a fourth instruction-address block; the first instruction-address block, the second instruction-address block, and the third instruction-address block proceed to the first pipeline stage in a first timing cycle, proceed to the second pipeline stage in a second timing cycle, proceed to the third pipeline stage in a third timing cycle, and proceed to the fourth pipeline stage in a fourth timing cycle; in the second timing cycle, the fourth instruction-address block proceeds to the first pipeline stage while the first instruction-address block has not yet proceeded to the fourth pipeline stage; in the second timing cycle, a first history pattern assumption is made for the fourth instruction-address block to consider a no-branch case of the first instruction-address block; and in the second timing cycle, a second history pattern assumption is made for the fourth instruction-address block to consider a not-taken branch case of the first instruction-address block. 13. The microprocessor as claimed in claim 12 , wherein: the branch predictor obtains a branch prediction result of the first instruction-address block in the fourth timing cycle; and in a fifth timing cycle following the fourth timing cycle, the branch predictor obtains a first possible branch prediction result for the fourth instruction-address block based on the first history pattern assumption and a second possible branch prediction result for the fourth instruction-address block based on the second history pattern assumption, and selects one of the first possible branch prediction result and the second possible branch prediction result according to the branch prediction result of the first instruction-address block as a branch prediction result of the fourth instruction-address block. 14. The microprocessor as claimed in claim 13 , wherein: a fifth instruction-address block is subsequent to the fourth instruction-address block; in the third timing cycle, the fifth instruction-address block proceeds to the first pipeline stage while the first instruction-address block and the second instruction-address block have not yet proceeded to the fourth pipeline stage; in the third timing cycle, a third history pattern assumption is made for the fifth instruction-address block to consider a case wherein both the first instruction-address block and the second instruction-address block involve no branch;
for branches, e.g. hedging, branch folding · CPC title
Address formation of the next instruction, e.g. by incrementing the instruction counter (G06F9/38 takes precedence) · CPC title
using hybrid branch prediction, e.g. selection between prediction techniques · CPC title
with dedicated cache, e.g. instruction or stack · CPC title
using address prediction, e.g. return stack, branch history buffer · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.