Methods and apparatus for data transfer optimization
US-2018307470-A1 · Oct 25, 2018 · US
US11200035B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-11200035-B1 |
| Application number | US-201715822996-A |
| Country | US |
| Kind code | B1 |
| Filing date | Nov 27, 2017 |
| Priority date | Dec 12, 2011 |
| Publication date | Dec 14, 2021 |
| Grant date | Dec 14, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least one local memory unit that allows for data reuse opportunities. The first custom computing apparatus optimizes the code for reduced communication execution on the second computing apparatus.
Opening claim text (preview).
What is claimed is: 1. A method for improving data movements during parallelized execution of a program on a multi-execution unit computing apparatus, the method comprising: receiving in memory on a first computing apparatus, a computer program comprising a loop nest, the first computing apparatus comprising the memory and a processor; transforming the computer program for execution on a second computing apparatus, the second computing apparatus comprising a plurality of computation units, the transformation comprising: selecting a communication statement within the loop nest, the communication statement transferring a data element from a first data structure to a second data structure; identifying a candidate loop within the loop nest wherein a placement function for the communication statement, that designates execution of instances of the communication statement to the plurality of computation units, is invariant across an iteration domain of a sub-loop-nest within the candidate loop; determining that: a plurality of memory accesses associated with the instances of the communication statement are invariant across the iteration domain of the sub-loop-nest; and the instances of the communication statement lack data dependencies with one or more instances of another statement; and hoisting the communication statement outside the candidate loop or conditioning the communication statement on a particular iteration of the candidate loop. 2. The method of claim 1 , wherein: the first data structure is formed within a global memory accessible to each of the plurality of computation units; and the second data structure is formed within a local memory of a first processing unit in the plurality of computation units. 3. The method of claim 2 , wherein the local memory of the first processing unit is not accessible to any other processing unit in the plurality of computation units. 4. The method of claim 2 , wherein the local memory of the first processing unit is accessible to at least one other processing unit but is not accessible to all processing units in the plurality of computation units. 5. The method of claim 1 , wherein: the first data structure is formed within a local memory of a first processing unit in the plurality of computation units; and the second data structure is formed within a global memory accessible to each of the plurality of computation units. 6. The method of claim 1 , wherein the data dependency comprises a read-after-write dependency or a write-after-read dependency. 7. The method of claim 1 , wherein the transformation comprises, prior to the selecting, identifying, determining, and hoisting steps, tiling the loop nest.
Related publications grouped by family.
Answers are generated from the same data shown on this page.