Methods and apparatus for joint scheduling and layout optimization to enable multi-level vectorization

US9489180B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9489180-B1
Application numberUS-201213679861-A
CountryUS
Kind codeB1
Filing dateNov 16, 2012
Priority dateNov 18, 2011
Publication dateNov 8, 2016
Grant dateNov 8, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least one vector execution unit that allow for parallel execution of tasks on constant-strided memory locations. The first custom computing apparatus optimizes the code for parallelism, locality of operations, constant-strided memory accesses and vectorized execution on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of scheduling operations of a program on a multi-execution unit computing apparatus, the method comprising: receiving in memory on a first computing apparatus, a computer program comprising a set of operations, the first computing apparatus comprising the memory and a processor; transforming the computer program for execution on a second computing apparatus, the second computing apparatus comprising at least one vector execution unit, the transformation comprising: optimizing a selective tradeoff of a cost of parallelism, locality, vectorization and data-layout transformations to orchestrate computations associated with the program, the selective tradeoff comprising trading off a penalty for performing a data-layout transformation so as to adjust a spacing between memory access with a performance improvement due to vectorization corresponding to the data-layout transformation; and producing an optimized computer program for execution on the second computing apparatus, wherein the step of optimizing the selective tradeoff comprises: determining an optimization problem representing: (i) each statement in a set of statements in the program, at least one statement in the set being associated with a multi-dimensional memory reference; (ii) a set of candidate schedules, each candidate schedule representing at least a partial order of execution of operations in the program on the second computing apparatus, and (iii) a set of aggregate costs, each aggregate cost being a cost of execution according to a corresponding candidate schedule, the aggregate cost comprising a parallelism cost and a locality cost, and at least one of a vectorization cost and a data-layout transformation cost; and optimizing the problem by selecting at least one of a loop transformation and a data-layout transformation to obtain a final schedule, such that the aggregate cost associated with the final schedule is minimized; and identifying at least one of the selected loop transformation and the selected data-layout transformation. 2. The method of claim 1 , wherein the identified data-layout transformation corresponds to any one dimension of the multi-dimensional memory reference. 3. The method of claim 1 , wherein the aggregate cost corresponding to a candidate schedule in the set of candidate schedules is based on, at least in part, a static evaluation of a model of cost of executing operations of the at least one statement according to the candidate schedule. 4. The method of claim 1 , wherein the aggregate cost corresponding to a candidate schedule in the set of candidate schedules is based on, at least in part, a dynamic evaluation of a model of cost of executing operations of the at least one statement according to the candidate schedule, the dynamic evaluation being based on, at least in part, the second computing apparatus. 5. The method of claim 4 further comprising iteratively refining the aggregate cost using at least one of a static evaluation of a model of cost of executing operations of the at least one statement according to the candidate schedule and the dynamic evaluation. 6. The method of claim 1 further comprising determining a search space to be used in the step of optimizing the optimization problem, wherein at least one candidate schedule in the set of candidate schedules is selected by traversing the search space. 7. The method of claim 6 , wherein the traversing step comprises exhaustively traversing the search space. 8. The method of claim 6 , wherein the set of candidate schedules comprises each schedule in the search space. 9. The method of claim 1 , wherein the set of operations comprises at least one loop nest. 10. The method of claim 9 , wherein: the transformation further comprises receiving a first schedule dimension within a plurality of schedule dimensions associated with the loop nest; and the optimizing step comprises solving, by a solver, the optimization problem to obtain the final schedule, wherein the final schedule corresponds to the first schedule dimension. 11. The method of claim 10 , wherein the first schedule dimension corresponds to any linear combination of any loops in the loop nest. 12. The method of claim 10 further comprising: receiving a second schedule dimension within the plurality of schedule dimensions associated with the loop nest; adding schedule orthogonality constraints based on the first schedule dimension; and repeating the solving and identifying steps based on the second schedule dimension such that at least one of a loop transformation and a data-layout transformation associated with the final schedule corresponds to the first or second schedule dimensions. 13. The method of claim 9 , wherein the optimizing step comprises computing a difference between: (a) speed of sequentially executing operations of a loop in the loop nest on a single execution unit in the second computing apparatus and (b) speed of executing those operations in parallel on a plurality of execution units in the second computing apparatus. 14. The method of claim 9 , wherein: the loop nest comprises first and second loops; and the optimizing step comprises computing a difference between: (a) speed of alternately executing an operation of the first loop and an operation of the second loop and (b) speed of executing all operations of the first loop followed by executing all operations of the second loop. 15. The method of claim 1 , wherein: the optimizing step comprises computing a difference between: (a) speed of executing operations of the at least one statement associated with the multi-dimensional memory reference such that a plurality of memory locations associated with the memory reference are accessed according to a uniform spacing between successive accesses and (b) speed of executing those operations such that the plurality of memory locations accessed are spaced apart nonuniformly. 16. The method of claim 1 , wherein: the penalty for performing a data-layout transformation so as to adjust a spacing between memory access comprises a penalty associated with multi-dimensional memory reference. 17. The method of claim 1 , wherein the optimizing step comprises assigning a decision variable corresponding to at least one of parallelism, locality, vectorization, and data-layout transformations. 18. The method of claim 17 , wherein the set of operations comprises at least one loop nest, and the decision variable specifies at least one of: (i) if a loop in the loop nest is to be executed in parallel by the second computing apparatus; (ii) if a pair of loops in the loop nest are to be executed together by the second computing apparatus; (iii) if a loop in the loop nest is to be executed as a vector loop; and (iv) if a memory reference associated with a loop within the loop nest requires a data-layout transformation allowing for vector access to the memory reference during execution of operations of the loop. 19. The method of claim 18 , wherein the optimizing step comprises optimizing a global weighted parametric function of: (i) the parallelism cost, (ii) the locality cost, (iii) the vectorization cost, (iv) the data-layout transformations cost, and (v) four decision variables. 20. The method of claim 1 further comprising identifying, by the at least one processor in the first computing apparatus and a dependence analysis module, a plurality of schedules of operations in the set of operations, at least one of the identified schedules improving at least one of par

Assignees

Inventors

Classifications

  • G06F8/447Primary

    Target code generation · CPC title

  • G06F8/443Primary

    Optimisation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9489180B1 cover?
Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution …
Who is the assignee on this patent?
Reservoir Labs Inc
What technology area does this patent fall under?
Primary CPC classification G06F8/447. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 08 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).