Memory-aware matrix factorization
US-2016371005-A1 · Dec 22, 2016 · US
US9424038B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9424038-B2 |
| Application number | US-201213710279-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 10, 2012 |
| Priority date | Dec 10, 2012 |
| Publication date | Aug 23, 2016 |
| Grant date | Aug 23, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A compiler-controlled technique for scheduling threads to execute different regions of a program. A compiler analyzes program code to determine a control flow graph for the program code. The control flow graph contains regions and directed edges between regions. The regions have associated execution priorities. The directed edges indicate the direction of program control flow. Each region has a thread frontier which contains one or more regions. The compiler inserts one or more update predicate mask variable instructions at the end of a region. The compiler also inserts one or more conditional branch instructions at the end of the region. The conditional branch instructions are arranged in order of execution priority of the regions in the thread frontier of the region, to enforce execution priority of the regions at runtime.
Opening claim text (preview).
The claimed invention is: 1. A method for scheduling threads to execute different regions of a program, the method comprising: analyzing a control flow graph that is based on program code and includes a plurality of regions, wherein each region represents a different portion of the program code, is assigned an execution priority, and is associated with a thread frontier that includes one or more thread frontier regions, each thread frontier region being a region in the control flow graph; inserting one or more update predicate mask variable instructions at the end of a first region included in the plurality of regions based on the control flow graph and the program code; and inserting one or more conditional branch instructions at the end of the first region that are arranged to reflect execution priority of the one or more thread frontier regions in the thread frontier of the first region. 2. The method of claim 1 , further comprising determining that a branch instruction is included at the end of the first region, and replacing the branch instruction with an instruction configured to calculate a branch condition bitmask variable for the first region. 3. The method of claim 2 , wherein each region in the control flow graph has one or more successor regions and one or more predecessor regions, each successor region being a region in the control flow graph, and each predecessor region being a region in the control flow graph, and wherein the branch instruction at the end of the first region has a branch-taken target region that is a region in the plurality of regions in the control flow graph and a branch-not-taken target region that is a region in the plurality of regions in the control flow graph, and inserting one or more update predicate mask variable instructions further comprises: determining that the branch-taken target region has multiple predecessor regions, and inserting instructions configured to merge threads that take the branch in the first region with threads waiting to execute the branch-taken target region; or determining that the branch-taken target region does not have multiple predecessor regions, and inserting instructions configured to assign threads that take the branch in the first region to the branch-taken target region. 4. The method of claim 3 , further comprising: determining that the branch-not-taken target region has multiple predecessor regions, and inserting instructions configured to merge threads not taking the branch in the first region with threads waiting to execute the branch-not-taken target region; or determining that the branch-not-taken target region does not have multiple predecessor regions, and inserting instructions configured to assign threads that do not take the branch in the first region to the branch-not-taken target region. 5. The method of claim 1 , wherein each region in the control flow graph has one or more successor regions and one or more predecessor regions, each successor region being a region in the control flow graph, and each predecessor region being a region in the control flow graph, and wherein inserting a first set of update predicate mask variable instructions further comprises: determining that a branch instruction is not included at the end of the first region; and either determining that a successor region of the first region has multiple predecessor regions, and inserting instructions configured to merge threads executing the first region with the threads waiting to execute the successor region, or determining that a successor region of the first region does not have multiple predecessor regions, and inserting instructions configured to assign threads executing the first region to the successor region of the first region. 6. The method of claim 1 , wherein inserting the one or more conditional branch instructions further comprises: inserting a plurality of conditional branch instructions at the end of the first region in order of execution priority of the thread frontier regions in the thread frontier of the first region, each conditional branch instruction having a respective target thread frontier region, each conditional branch instruction being configured to: determine whether threads are waiting to execute the respective target thread frontier region in the thread frontier of the first region and branch to the target thread frontier region if threads are waiting to execute the respective target thread frontier region. 7. The method of claim 1 , further comprising inserting one or more instructions at the beginning of the first region to set a predicate mask for the first region. 8. The method of claim 1 , further comprising optimizing the one or more update predicate mask variable instructions and the one or more conditional branch instructions by performing one or more of dead code elimination and loop-invariant code motion. 9. The method of claim 1 , further comprising: inserting one or more update predicate mask variable instructions at the end of a second region in the plurality of regions; and inserting one or more conditional branch instructions at the end of the second region, the conditional branch instructions being arranged in order of execution priority of the thread frontier regions in a thread frontier of the second region. 10. A non-transitory computer-readable medium storing instructions, that when executed by a processor, cause a computer system to schedule threads to execute different regions of a program, by performing the steps of: analyzing a control flow graph that is based on program code and includes a plurality of regions, wherein each region represents a different portion of the program code, is assigned an execution priority, and is associated with a thread frontier that includes one or more thread frontier regions, each thread frontier region being a region in the control flow graph; inserting one or more update predicate mask variable instructions at the end of a first region included in the plurality of regions based on the control flow graph and the program code; and inserting one or more conditional branch instructions at the end of the first region that are arranged to reflect execution priority of the one or more thread frontier regions in the thread frontier of the first region. 11. The non-transitory computer-readable medium of claim 10 , further comprising: determining that a branch instruction is included at the end of the first region, and replacing the branch instruction with an instruction configured to calculate a branch condition bitmask variable for the first region. 12. The non-transitory computer-readable medium of claim 11 , wherein: each region in the control flow graph has one or more successor regions and one or more predecessor regions, each successor region being a region in the control flow graph, and each predecessor region being a region in the control flow graph, and wherein the branch instruction at the end of the first region has a branch-taken target region that is a region in the plurality of regions in the control flow graph and a branch-not-taken target region that is a region in the plurality of regions in the control flow graph, and inserting one or more update predicate mask variable instructions further comprises: determining that the branch-taken target region has multiple predecessor regions, and inserting instructions configured to merge threads that take the branch in the first region with threads waiting to execute the branch-taken target region; or determining that the branch-taken target region does not have multiple predecessor regions, and inserting instructions configured to assign threads that take the branch in th
Data distribution · CPC title
Optimisation · CPC title
to perform miscellaneous control operations, e.g. NOP · CPC title
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
from multiple instruction streams, e.g. multistreaming · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.