Microprocessor with multi-step ahead branch predictor and having a fetch-target queue between the branch predictor and instruction cache

US11403103B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11403103-B2
Application numberUS-202017069217-A
CountryUS
Kind codeB2
Filing dateOct 13, 2020
Priority dateApr 14, 2020
Publication dateAug 2, 2022
Grant dateAug 2, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A microprocessor is shown, in which a branch predictor and an instruction cache are decoupled by a fetch-target queue (FTQ). The branch predictor performs branch prediction for N instruction addresses in parallel in the same cycle, wherein N is an integer greater than 1. In the current cycle, the branch predictor finishes branch prediction for N instruction addresses in parallel and, among the N instruction addresses with finished branch prediction, those that are not bypassed and do not overlap previously-predicted instruction addresses are pushed into the fetch-target queue, to be read out later as an instruction-fetching address for the instruction cache. The previously-predicted instruction addresses are pushed into the fetch-target queue in a previous cycle.

First claim

Opening claim text (preview).

What is claimed is: 1. A microprocessor, comprising: an instruction cache, operated according to an instruction-fetching address for instruction fetching; a branch predictor, performing branch prediction for N instruction addresses in parallel in the same time, wherein N is an integer greater than 1; and a fetch-target queue, coupled between the branch predictor and the instruction cache, wherein: in a current cycle, the branch predictor finishes branch prediction for the N instruction addresses in parallel and, among the N instruction addresses with finished branch prediction, those that are not bypassed and which do not overlap previously-predicted instruction addresses are pushed into the fetch-target queue, to be read out later as the instruction-fetching address for the instruction cache; the previously-predicted instruction addresses are pushed into the fetch-target queue in a previous cycle; in the current cycle, when the branch predictor predicts that in N chunks indicated by the N instruction addresses with finished branch prediction, a branch is predicted to be taken, and the taken branch is called by a branch instruction across two adjacent chunks, an instruction address indicating a second chunk next to a first chunk corresponding to the branch instruction calling the taken branch is pushed into the fetch-target queue, to be read out later as the instruction-fetching address for the instruction cache; N is 3; each chunk indicated by each instruction address is M bytes, M is a number; in the current cycle, the branch predictor finishes branch prediction for instruction addresses PC, PC+M, and PC+2*M; in a first setting, there is one overlapping instruction address between branch prediction finished in the current cycle and branch prediction finished in the previous cycle; in the first setting, when the instruction address PC is not bypassed and has not been pushed into the fetch-target queue in the previous cycle: the fetch-target queue provides a first entry to store the instruction address PC; the fetch-target queue provides a second entry to store the instruction address PC+M when no branch is predicted to be taken in a chunk indicated by the instruction address PC, or when a branch is predicted to be taken in the chunk indicated by the instruction address PC and the taken branch is called by a branch instruction across two adjacent chunks; the fetch-target queue provides a third entry to store the instruction address PC+2*M when no branch is predicted to be taken in two chunks indicated by the instruction addresses PC and PC+M, or when no branch is predicted to be taken in the chunk indicated by the instruction address PC, a branch is predicted to be taken in the chunk indicated by the instruction address PC+M, and the taken branch is called by a branch instruction across two adjacent chunks; and the fetch-target queue provides a fourth entry to store an instruction address PC+3*M when no branch is predicted to be taken in the two chunks indicated by the instruction addresses PC and PC+M, a branch is predicted to be taken in a chunk indicated by the instruction address PC+2*M, and the taken branch is called by a branch instruction across two adjacent chunks. 2. The microprocessor as claimed in claim 1 , wherein: the branch predictor involves multiple stages of first pipeline operations; and the instruction cache involves multiple stages of second pipeline operations. 3. The microprocessor as claimed in claim 1 , wherein in the first setting, when the instruction addresses PC, PC+M are not bypassed and the instruction address PC has been pushed into the fetch-target queue in the previous cycle: the fetch-target queue provides a first entry to store the instruction address PC+M; the fetch-target queue provides a second entry to store the instruction address PC+2*M when no branch is predicted to be taken in a chunk indicated by the instruction address PC+M, or when a branch is predicted to be taken in the chunk indicated by the instruction address PC+M and the taken branch is called by a branch instruction across two adjacent chunks; the fetch-target queue provides a third entry to store an instruction address PC+3*M when no branch is predicted to be taken in the chunk indicated by the instruction address PC+M, a branch is predicted to be taken in a chunk indicated by the instruction address PC+2*M, and the taken branch is called by a branch instruction across two adjacent chunks. 4. The microprocessor as claimed in claim 1 , wherein: in a second setting, there are two overlapping instruction addresses between the branch prediction finished in the current cycle and the branch prediction finished in the previous cycle. 5. The microprocessor as claimed in claim 4 , wherein in the second setting, when the instruction address PC is not bypassed and has not been pushed into the fetch-target queue in the previous cycle: the fetch-target queue provides a first entry to store the instruction address PC; the fetch-target queue provides a second entry to store the instruction address PC+M when no branch is predicted to be taken in a chunk indicated by the instruction address PC, or when a branch is predicted to be taken in the chunk indicated by the instruction address PC and the taken branch is called by a branch instruction across two adjacent chunks; the fetch-target queue provides a third entry to store the instruction address PC+2*M when no branch is predicted to be taken in two chunks indicated by the instruction addresses PC and PC+M, or when no branch is predicted to be taken in the chunk indicated by the instruction address PC, a branch is predicted to be taken in the chunk indicated by the instruction address PC+M, and the taken branch is called by a branch instruction across two adjacent chunks; and the fetch-target queue provides a fourth entry to store an instruction address PC+3*M when no branch is predicted to be taken in the two chunks indicated by the instruction addresses PC and PC+M, a branch is predicted to be taken in a chunk indicated by the instruction address PC+2*M, and the taken branch is called by a branch instruction across two adjacent chunks. 6. The microprocessor as claimed in claim 4 , wherein in the second setting, when the instruction addresses PC, PC+M are not bypassed and have been pushed into the fetch-target queue in the previous cycle: the fetch-target queue provides a first entry to store the instruction address PC+2*M; the fetch-target queue provides a second entry to store an instruction address PC+3*M when a branch is predicted to be taken in the chunk indicated by the instruction address PC+2*M and the taken branch is called by a branch instruction across two adjacent chunks. 7. The microprocessor as claimed in claim 1 , wherein: in a third setting, there are no overlapping instruction addresses between the branch prediction finished in the current cycle and the branch prediction finished in the previous cycle. 8. The microprocessor as claimed in claim 7 , wherein in the third setting, when the instruction address PC is not bypassed: the fetch-target queue provides a first entry to store the instruction address PC; the fetch-target queue provides a second entry to store the instruction address PC+M when no branch is predicted to be taken in a chunk indicated by the instruction address PC, or when a branch is predicted to be taken in the chunk indicated by the instruction address PC and the taken branch is called by a branch instruction across two adjacent chunks; the fetch-target queue provides a third entry to store the instruction address PC+2*M when no branch is predicted to be taken in two chunks indicated by the instruction addresses PC and PC+M, or when no branch is pred

Assignees

Inventors

Classifications

  • Recovery, e.g. branch miss-prediction, exception handling (error detection or correction G06F11/00) · CPC title

  • using a plurality of independent parallel functional units · CPC title

  • Prefetch instructions; cache control instructions · CPC title

  • with dedicated cache, e.g. instruction or stack · CPC title

  • for instruction reuse, e.g. trace cache, branch target cache · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11403103B2 cover?
A microprocessor is shown, in which a branch predictor and an instruction cache are decoupled by a fetch-target queue (FTQ). The branch predictor performs branch prediction for N instruction addresses in parallel in the same cycle, wherein N is an integer greater than 1. In the current cycle, the branch predictor finishes branch prediction for N instruction addresses in parallel and, among the …
Who is the assignee on this patent?
Shanghai Zhaoxin Semiconductor Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F9/3802. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 02 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).