Methods and apparatus for automatic communication optimizations in a compiler based on a polyhedral representation

US11200035B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11200035-B1
Application numberUS-201715822996-A
CountryUS
Kind codeB1
Filing dateNov 27, 2017
Priority dateDec 12, 2011
Publication dateDec 14, 2021
Grant dateDec 14, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least one local memory unit that allows for data reuse opportunities. The first custom computing apparatus optimizes the code for reduced communication execution on the second computing apparatus.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for improving data movements during parallelized execution of a program on a multi-execution unit computing apparatus, the method comprising: receiving in memory on a first computing apparatus, a computer program comprising a loop nest, the first computing apparatus comprising the memory and a processor; transforming the computer program for execution on a second computing apparatus, the second computing apparatus comprising a plurality of computation units, the transformation comprising: selecting a communication statement within the loop nest, the communication statement transferring a data element from a first data structure to a second data structure; identifying a candidate loop within the loop nest wherein a placement function for the communication statement, that designates execution of instances of the communication statement to the plurality of computation units, is invariant across an iteration domain of a sub-loop-nest within the candidate loop; determining that: a plurality of memory accesses associated with the instances of the communication statement are invariant across the iteration domain of the sub-loop-nest; and the instances of the communication statement lack data dependencies with one or more instances of another statement; and hoisting the communication statement outside the candidate loop or conditioning the communication statement on a particular iteration of the candidate loop. 2. The method of claim 1 , wherein: the first data structure is formed within a global memory accessible to each of the plurality of computation units; and the second data structure is formed within a local memory of a first processing unit in the plurality of computation units. 3. The method of claim 2 , wherein the local memory of the first processing unit is not accessible to any other processing unit in the plurality of computation units. 4. The method of claim 2 , wherein the local memory of the first processing unit is accessible to at least one other processing unit but is not accessible to all processing units in the plurality of computation units. 5. The method of claim 1 , wherein: the first data structure is formed within a local memory of a first processing unit in the plurality of computation units; and the second data structure is formed within a global memory accessible to each of the plurality of computation units. 6. The method of claim 1 , wherein the data dependency comprises a read-after-write dependency or a write-after-read dependency. 7. The method of claim 1 , wherein the transformation comprises, prior to the selecting, identifying, determining, and hoisting steps, tiling the loop nest.

Assignees

Inventors

Classifications

  • G06F8/453Primary

    Data distribution · CPC title

  • Communication (intertask communication G06F9/54) · CPC title

  • G06F8/41Primary

    Compilation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11200035B1 cover?
Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution …
Who is the assignee on this patent?
Reservoir Labs Inc
What technology area does this patent fall under?
Primary CPC classification G06F8/453. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 14 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).