Macro-op fusion
US-2021255859-A1 · Aug 19, 2021 · US
US12487829B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12487829-B2 |
| Application number | US-202418428319-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 31, 2024 |
| Priority date | Feb 3, 2023 |
| Publication date | Dec 2, 2025 |
| Grant date | Dec 2, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods are disclosed for macro-op fusion in pipelined architectures. For example, some methods include detecting a sequence of macro-ops stored in an instruction decode buffer, the sequence of macro-ops including a first macro-op, followed by one or more intervening macro-ops, followed by a last macro-op; determining a micro-op that is equivalent to the first macro-op combined with the last macro-op; and forwarding the micro-op to one or more execution resource circuitries for execution.
Opening claim text (preview).
What is claimed is: 1 . An integrated circuit comprising: one or more execution resource circuitries configured to execute micro-ops to support an instruction set including macro-ops, an instruction decode buffer configured to store macro-ops fetched from memory, and an instruction decoder circuitry configured to: detect a sequence of macro-ops stored in the instruction decode buffer, the sequence of macro-ops including a first macro-op, followed by one or more intervening macro-ops, followed by a last macro-op, wherein the one or more intervening macro-ops include a conditional branch macro-op; determine a micro-op that is equivalent to the first macro-op combined with the last macro-op; and forward the micro-op to at least one of the one or more execution resource circuitries for execution. 2 . The integrated circuit of claim 1 , in which the instruction decoder circuitry is configured to: check that the last macro-op is independent of the one or more intervening macro-ops. 3 . The integrated circuit of claim 1 , in which the instruction decoder circuitry is configured to: check that the one or more intervening macro-ops can be issued in a same clock cycle as the micro-op. 4 . The integrated circuit of claim 1 , in which the last macro-op is a control flow instruction. 5 . The integrated circuit of claim 1 , further comprising a branch speculation circuitry configured to: store a pointer to the first macro-op, associated with a prediction of whether the conditional branch macro-op will be taken; and responsive to detecting that the conditional branch macro-op has been mispredicted, flush a processor pipeline including the instruction decoder circuitry and the one or more execution resource circuitries to restart execution with the first macro-op. 6 . The integrated circuit of claim 1 , in which the one or more execution resource circuitries include an early execution resource circuitry and a late execution resource circuitry that is after the early execution resource circuitry in a processor pipeline and is configured to take output from the early execution resource circuitry as input, and in which the micro-op is executed by both the early execution resource circuitry and the late execution resource circuitry. 7 . The integrated circuit of claim 6 , in which the instruction decoder circuitry is configured to: determine a prediction of whether a delay caused by fusing the first macro-op with the last macro-op will be below a threshold, wherein the micro-op is determined responsive to the prediction indicating that the delay will be below the threshold. 8 . The integrated circuit of claim 1 , in which the one or more execution resource circuitries include a first execution resource circuitry in a first processor pipeline branch and a second execution resource circuitry in a second processor pipeline branch that operates in parallel with the first processor pipeline branch, and in which the micro-op is executed by both the first execution resource circuitry and the second execution resource circuitry. 9 . The integrated circuit of claim 1 , in which the one or more intervening macro-ops include at least two macro-ops. 10 . The integrated circuit of claim 1 , in which the one or more intervening macro-ops consist of a number of macro-ops equal to two less than the number of macro-ops that the instruction decode buffer is sized to store. 11 . The integrated circuit of claim 1 , comprising a fusion predictor circuitry configured to: detect a prefix of the sequence of macro-ops in the instruction decode buffer; determine a prediction of whether the sequence of macro-ops will be completed in a next fetch of macro-ops from memory and fused; and based on the prediction, delay execution of the prefix until after the next fetch to enable fusion of the sequence of macro-ops. 12 . The integrated circuit of claim 11 , in which the fusion predictor circuitry is configured to: maintain a table of prediction counters, wherein the table of prediction counters is used to determine the prediction. 13 . The integrated circuit of claim 12 , in which the fusion predictor circuitry is configured to update the table of prediction counters based on whether there are instructions in the next fetch that depend on instructions in the prefix. 14 . A method comprising: detecting a sequence of macro-ops stored in an instruction decode buffer, the sequence of macro-ops including a first macro-op, followed by one or more intervening macro-ops, followed by a last macro-op, wherein the one or more intervening macro-ops include a conditional branch macro-op; determining a micro-op that is equivalent to the first macro-op combined with the last macro-op; and forwarding the micro-op to one or more execution resource circuitries for execution. 15 . The method of claim 14 , further comprising: storing a pointer to the first macro-op, associated with a prediction of whether the conditional branch macro-op will be taken; and responsive to detecting that the conditional branch macro-op has been mispredicted, flushing a processor pipeline including the one or more execution resource circuitries to restart execution with the first macro-op. 16 . The method of claim 14 , in which the one or more execution resource circuitries include an early execution resource circuitry and a late execution resource circuitry that is after the early execution resource circuitry in a processor pipeline and is configured to take output from the early execution resource circuitry as input, and in which the micro-op is executed by both the early execution resource circuitry and the late execution resource circuitry. 17 . The method of claim 14 , comprising: determining a prediction of whether a delay caused by fusing the first macro-op with the last macro-op will be below a threshold, wherein the micro-op is determined responsive to the prediction indicating that the delay will be below the threshold. 18 . The method of claim 14 , in which the one or more execution resource circuitries include a first execution resource circuitry in a first processor pipeline branch and a second execution resource circuitry in a second processor pipeline branch that operates in parallel with the first processor pipeline branch, and in which the micro-op is executed by both the first execution resource circuitry and the second execution resource circuitry. 19 . The method of claim 14 , comprising: detecting a prefix of the sequence of macro-ops in the instruction decode buffer; determining a prediction of whether the sequence of macro-ops will be completed in a next fetch of macro-ops from memory and fused; and based on the prediction, delaying execution of the prefix until after the next fetch to enable fusion of the sequence of macro-ops. 20 . A non-transitory computer readable medium comprising a circuit representation that, when processed by a computer, is used to program or manufacture an integrated circuit comprising: one or more execution resource circuitries configured to execute micro-ops to support an instruction set including macro-ops, an instruction decode buffer configured to store macro-ops fetched from memory, and an instruction decoder circuitry configured to: detect a sequence of macro-ops stored in the instruction decode buffer, the sequence of macro-ops including a first macro-op, followed by one or more intervening macro-ops, followed by a last macro-op, wherein the one or more intervening macro-ops inclu
Loading of the microprogram · CPC title
using dynamic branch prediction, e.g. using branch history tables · CPC title
Speculative instruction execution · CPC title
Recovery, e.g. branch miss-prediction, exception handling (error detection or correction G06F11/00) · CPC title
using address prediction, e.g. return stack, branch history buffer · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.