Systems and methods for supporting a plurality of load accesses of a cache in a single cycle
US-2016041930-A1 · Feb 11, 2016 · US
US11656875B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11656875-B2 |
| Application number | US-202016928970-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 14, 2020 |
| Priority date | Mar 15, 2013 |
| Publication date | May 23, 2023 |
| Grant date | May 23, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for emulating a guest centralized flag architecture by using a native distributed flag architecture. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks, wherein each of the instruction blocks comprise two half blocks; scheduling the instructions of the instruction block to execute in accordance with a scheduler; and using a distributed flag architecture to emulate a centralized flag architecture for the emulation of guest instruction execution.
Opening claim text (preview).
What is claimed is: 1. A method of a processor, comprising: grouping an incoming instruction sequence to form instruction blocks, wherein each of the instruction blocks comprises two half blocks; dispatching the two half blocks of an instruction block independently or together as one instruction block to an execution unit based on dependency resolution between the two half blocks; and executing the instructions by at least one execution unit. 2. The method of claim 1 , wherein the incoming instruction sequence is received using a global front end and wherein each half block is configured to be dispatched independently. 3. The method of claim 1 , wherein the at least one execution unit has two slots to execute single or paired operations. 4. The method of claim 1 , wherein the at least one execution unit supports four types of execution, where the four types of execution are parallel halves, atomic parallel halves, atomic serial halves, and sequential halves using the two half blocks in each slot of the at least one execution unit. 5. The method of claim 4 , wherein the parallel halves executes each of the two half blocks independently once sources of each of the two half blocks are ready, and wherein atomic parallel halves executes each of the two half blocks in parallel and resources are shared between each of the two half blocks. 6. The method of claim 4 , wherein atomic serial halves execution forwards data from one of the two half blocks to the other of the two half blocks, and wherein sequential halves the one of the two half blocks depends on the other of the two half blocks and is dispatched in a later cycle than the one of the two half blocks and data is forwarded via external storage to resolve dependency. 7. A processor comprising: a front end to group an incoming instruction sequence to form instruction blocks, wherein each of the instruction blocks comprises two half blocks; a dispatcher to dispatch the two half blocks of an instruction block independently or together as one instruction block to an execution unit based on dependency resolution between the two half blocks; and at least one execution unit to execute the instructions. 8. The processor of claim 7 , wherein the front end is a global front end and wherein each half block is configured to be dispatched independently. 9. The processor of claim 7 , wherein the at least one execution unit has two slots to execute single or paired operations. 10. The processor of claim 7 , where the at least one execution unit supports four types of execution, where the four types of execution are parallel halves, atomic parallel halves, atomic serial halves, and sequential halves using the two half blocks in each slot of the at least one execution unit. 11. The processor of claim 10 , wherein the parallel halves executes each of the two half blocks independently once sources of each of the two half blocks are ready, and wherein atomic parallel halves executes each of the two half blocks in parallel and resources are shared between each of the two half blocks. 12. The processor of claim 10 , wherein atomic serial halves execution forwards data from one of the two half blocks to the other of the two half blocks, and wherein sequential halves the one of the two half blocks depends on the other of the two half blocks and is dispatched in a later cycle than the one of the two half blocks and data is forwarded via external storage to resolve dependency. 13. A system comprising at least one cache; and a processor coupled to the at least one cache, the processor including, a front end to group an incoming instruction sequence to form instruction blocks, wherein each of the instruction blocks comprises two half blocks, a dispatcher to dispatch the two half blocks of an instruction block independently or together as one instruction block to an execution unit based on dependency resolution between the two half blocks, and at least one execution unit to execute the instructions. 14. The system of claim 13 , wherein the front end is a global front end and wherein each half block is configured to be dispatched independently. 15. The system of claim 13 , wherein the at least one execution unit has two slots to execute single or paired operations. 16. The system of claim 13 , where the at least one execution unit supports four types of execution, where the four types of execution are parallel halves, atomic parallel halves, atomic serial halves, and sequential halves using the two half blocks in each slot of the at least one execution unit. 17. The system of claim 16 , wherein the parallel halves executes each of the two half blocks independently once sources of each of the two half blocks are ready, and wherein atomic parallel halves executes each of the two half blocks in parallel and resources are shared between each of the two half blocks. 18. The system of claim 16 , wherein atomic serial halves execution forwards data from one of the two half blocks to the other of the two half blocks, and wherein sequential halves the one of the two half blocks depends on the other of the two half blocks and is dispatched in a later cycle than the one of the two half blocks and data is forwarded via external storage to resolve dependency.
Result writeback, i.e. updating the architectural state or memory · CPC title
Instruction completion, e.g. retiring, committing or graduating · CPC title
from multiple instruction streams, e.g. multistreaming · CPC title
Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution · CPC title
Condition code generation, e.g. Carry, Zero flag · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.