Method and apparatus for compiling optimization using activation recalculation
US-2024303054-A1 · Sep 12, 2024 · US
US9292265B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9292265-B2 |
| Application number | US-201213467765-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 9, 2012 |
| Priority date | May 9, 2012 |
| Publication date | Mar 22, 2016 |
| Grant date | Mar 22, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Basic blocks within a thread program are characterized for convergence based on variance analysis or corresponding instructions. Each basic block is marked as divergent based on transitive control dependence on a block that is either divergent or comprising a variant branch condition. Convergent basic blocks that are defined by invariant instructions are advantageously identified as candidates for scalarization by a thread program compiler.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for characterizing a thread program, the method comprising: marking each basic block associated with the thread program as being convergent, wherein each basic block includes a plurality of instructions and starts with a label instruction and is terminated by a control transfer instruction; marking a set of instructions associated with each basic block as being invariant; initializing a work list that includes instructions that are known to be variant relative to the set of instructions; selecting a first instruction from the work list; marking the first instruction as variant; adding successor instructions to the work list based on the first instruction; and propagating a divergence attribute to identify associated basic blocks as divergent, and to identify instructions within the associated basic blocks as variant. 2. The method of claim 1 , wherein initializing comprises: visiting each basic block associated with the thread program; for each basic block, visiting each instruction associated with the basic block; determining that the instruction is variant; and adding the instruction to the work list. 3. The method of claim 2 , wherein an instruction that is variant accesses a thread identification register. 4. The method of claim 2 , wherein an instruction that is variant performs an atomic operation. 5. The method of claim 1 , wherein propagating comprises: determining that the first instruction is a conditional branch instruction; marking as divergent a set of basic blocks having a control dependence on the selected instruction; and adding each instruction associated with the set of basic blocks to the work list. 6. The method of claim 1 , wherein selecting an instruction comprises popping the instruction from the work list. 7. The method of claim 1 , wherein adding an instruction comprises pushing a unique instance of the instruction onto the work list. 8. The method of claim 1 , further comprising determining that a first basic block is a candidate for scalarization based on the first basic block being convergent. 9. The method of claim 8 , further comprising generating scalarized code for the first basic block for scalar execution. 10. A non-transitory computer-readable storage medium including instructions that, when executed by a processing unit, cause the processing unit to characterize a thread program, by performing the steps of: marking each basic block associated with the thread program as being convergent, wherein each basic block includes a plurality of instructions and starts with a label instruction and is terminated by a control transfer instruction; marking a set of instructions associated with each basic block as being invariant; initializing a work list that includes instructions that are known to be variant relative to the set of instructions; selecting a first instruction from the work list; marking the first instruction as variant; adding successor instructions to the work list based on the first instruction; and propagating a divergence attribute to identify associated basic blocks as divergent, and to identify instructions within the associated basic blocks as variant. 11. The non-transitory computer-readable storage medium of claim 10 , wherein initializing comprises: visiting each basic block associated with the thread program; for each basic block, visiting each instruction associated with the basic block; determining that the instruction is variant; and adding the instruction to the work list. 12. The non-transitory computer-readable storage medium of claim 11 , wherein an instruction that is variant accesses a thread identification register. 13. The non-transitory computer-readable storage medium of claim 11 , wherein an instruction that is variant performs an atomic operation. 14. The non-transitory computer-readable storage medium of claim 10 , wherein propagating comprises: determining that the first instruction is a conditional branch instruction; marking as divergent a set of basic blocks having a control dependence on the selected instruction; and adding each instruction associated with the set of basic blocks to the work list. 15. The non-transitory computer-readable storage medium of claim 10 , wherein selecting an instruction comprises popping the instruction from the work list. 16. The non-transitory computer-readable storage medium of claim 10 , wherein adding an instruction comprises pushing a unique instance of the instruction onto the work list. 17. The non-transitory computer-readable storage medium of claim 10 , further comprising determining that a first basic block is a candidate for scalarization based on the first basic block being convergent and including only invariant instructions. 18. The non-transitory computer-readable storage medium of claim 17 , further comprising scheduling the first basic block for scalar execution. 19. A computing device, comprising: a mass storage system configured to store at least a thread program; a processing unit coupled to the mass storage system and configured to: mark each basic block associated with the thread program as being convergent, wherein each basic block includes a plurality of instructions and starts with a label instruction and is terminated by a control transfer instruction; mark a set of instructions associated with each basic block as being invariant; initialize a work list that includes instructions that are known to be variant relative to the set of instructions; select a first instruction from the work list; mark the first instruction as variant; add successor instructions to the work list based on the first instruction; and propagate a divergence attribute to identify associated basic blocks as divergent, and to identify instructions within the associated basic blocks as variant. 20. The computing device of claim 19 , wherein the processing unit is further configured to: determine that a first basic block is a candidate for scalarization based on the first basic block being convergent and including only invariant instructions; and schedule the first basic block for scalar execution within a thread program executable.
Optimisation · CPC title
Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions · CPC title
Parallelism detection · CPC title
Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution · CPC title
Compilation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.