Methods and apparatus to validate translated guest code in a dynamic binary translator
US-9223553-B2 · Dec 29, 2015 · US
US9489180B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9489180-B1 |
| Application number | US-201213679861-A |
| Country | US |
| Kind code | B1 |
| Filing date | Nov 16, 2012 |
| Priority date | Nov 18, 2011 |
| Publication date | Nov 8, 2016 |
| Grant date | Nov 8, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least one vector execution unit that allow for parallel execution of tasks on constant-strided memory locations. The first custom computing apparatus optimizes the code for parallelism, locality of operations, constant-strided memory accesses and vectorized execution on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.
Opening claim text (preview).
What is claimed is: 1. A method of scheduling operations of a program on a multi-execution unit computing apparatus, the method comprising: receiving in memory on a first computing apparatus, a computer program comprising a set of operations, the first computing apparatus comprising the memory and a processor; transforming the computer program for execution on a second computing apparatus, the second computing apparatus comprising at least one vector execution unit, the transformation comprising: optimizing a selective tradeoff of a cost of parallelism, locality, vectorization and data-layout transformations to orchestrate computations associated with the program, the selective tradeoff comprising trading off a penalty for performing a data-layout transformation so as to adjust a spacing between memory access with a performance improvement due to vectorization corresponding to the data-layout transformation; and producing an optimized computer program for execution on the second computing apparatus, wherein the step of optimizing the selective tradeoff comprises: determining an optimization problem representing: (i) each statement in a set of statements in the program, at least one statement in the set being associated with a multi-dimensional memory reference; (ii) a set of candidate schedules, each candidate schedule representing at least a partial order of execution of operations in the program on the second computing apparatus, and (iii) a set of aggregate costs, each aggregate cost being a cost of execution according to a corresponding candidate schedule, the aggregate cost comprising a parallelism cost and a locality cost, and at least one of a vectorization cost and a data-layout transformation cost; and optimizing the problem by selecting at least one of a loop transformation and a data-layout transformation to obtain a final schedule, such that the aggregate cost associated with the final schedule is minimized; and identifying at least one of the selected loop transformation and the selected data-layout transformation. 2. The method of claim 1 , wherein the identified data-layout transformation corresponds to any one dimension of the multi-dimensional memory reference. 3. The method of claim 1 , wherein the aggregate cost corresponding to a candidate schedule in the set of candidate schedules is based on, at least in part, a static evaluation of a model of cost of executing operations of the at least one statement according to the candidate schedule. 4. The method of claim 1 , wherein the aggregate cost corresponding to a candidate schedule in the set of candidate schedules is based on, at least in part, a dynamic evaluation of a model of cost of executing operations of the at least one statement according to the candidate schedule, the dynamic evaluation being based on, at least in part, the second computing apparatus. 5. The method of claim 4 further comprising iteratively refining the aggregate cost using at least one of a static evaluation of a model of cost of executing operations of the at least one statement according to the candidate schedule and the dynamic evaluation. 6. The method of claim 1 further comprising determining a search space to be used in the step of optimizing the optimization problem, wherein at least one candidate schedule in the set of candidate schedules is selected by traversing the search space. 7. The method of claim 6 , wherein the traversing step comprises exhaustively traversing the search space. 8. The method of claim 6 , wherein the set of candidate schedules comprises each schedule in the search space. 9. The method of claim 1 , wherein the set of operations comprises at least one loop nest. 10. The method of claim 9 , wherein: the transformation further comprises receiving a first schedule dimension within a plurality of schedule dimensions associated with the loop nest; and the optimizing step comprises solving, by a solver, the optimization problem to obtain the final schedule, wherein the final schedule corresponds to the first schedule dimension. 11. The method of claim 10 , wherein the first schedule dimension corresponds to any linear combination of any loops in the loop nest. 12. The method of claim 10 further comprising: receiving a second schedule dimension within the plurality of schedule dimensions associated with the loop nest; adding schedule orthogonality constraints based on the first schedule dimension; and repeating the solving and identifying steps based on the second schedule dimension such that at least one of a loop transformation and a data-layout transformation associated with the final schedule corresponds to the first or second schedule dimensions. 13. The method of claim 9 , wherein the optimizing step comprises computing a difference between: (a) speed of sequentially executing operations of a loop in the loop nest on a single execution unit in the second computing apparatus and (b) speed of executing those operations in parallel on a plurality of execution units in the second computing apparatus. 14. The method of claim 9 , wherein: the loop nest comprises first and second loops; and the optimizing step comprises computing a difference between: (a) speed of alternately executing an operation of the first loop and an operation of the second loop and (b) speed of executing all operations of the first loop followed by executing all operations of the second loop. 15. The method of claim 1 , wherein: the optimizing step comprises computing a difference between: (a) speed of executing operations of the at least one statement associated with the multi-dimensional memory reference such that a plurality of memory locations associated with the memory reference are accessed according to a uniform spacing between successive accesses and (b) speed of executing those operations such that the plurality of memory locations accessed are spaced apart nonuniformly. 16. The method of claim 1 , wherein: the penalty for performing a data-layout transformation so as to adjust a spacing between memory access comprises a penalty associated with multi-dimensional memory reference. 17. The method of claim 1 , wherein the optimizing step comprises assigning a decision variable corresponding to at least one of parallelism, locality, vectorization, and data-layout transformations. 18. The method of claim 17 , wherein the set of operations comprises at least one loop nest, and the decision variable specifies at least one of: (i) if a loop in the loop nest is to be executed in parallel by the second computing apparatus; (ii) if a pair of loops in the loop nest are to be executed together by the second computing apparatus; (iii) if a loop in the loop nest is to be executed as a vector loop; and (iv) if a memory reference associated with a loop within the loop nest requires a data-layout transformation allowing for vector access to the memory reference during execution of operations of the loop. 19. The method of claim 18 , wherein the optimizing step comprises optimizing a global weighted parametric function of: (i) the parallelism cost, (ii) the locality cost, (iii) the vectorization cost, (iv) the data-layout transformations cost, and (v) four decision variables. 20. The method of claim 1 further comprising identifying, by the at least one processor in the first computing apparatus and a dependence analysis module, a plurality of schedules of operations in the set of operations, at least one of the identified schedules improving at least one of par
Related publications grouped by family.
Answers are generated from the same data shown on this page.