Computer system, method and computer-readable storage medium for tasks scheduling
US-2015135186-A1 · May 14, 2015 · US
US9727377B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9727377-B2 |
| Application number | US-201214232352-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 16, 2012 |
| Priority date | Jul 14, 2011 |
| Publication date | Aug 8, 2017 |
| Grant date | Aug 8, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for pipeline parallelizing a control program for multi-core execution includes using ( 12 ) data dependency analysis on a control program to identify tasks that can be performed in parallel, identifying ( 13 ) a largest task Tmax requiring the most execution time of the identified tasks, identifying ( 14 ) cut-points in the largest task Tmax where data dependency delays decouple the task, inserting ( 15 ) delayed data dependencies into cut-points of the largest task Tmax to create N pipeline sub-tasks, in which N is a number of cores available to a processor on which the control program will be executed, and scheduling ( 16 ) the tasks and pipeline sub-tasks to the available processor cores.
Opening claim text (preview).
What is claimed is: 1. A method for pipeline parallelizing a control program for multi-core execution, comprising: performing a data dependency analysis on the control program to identify a plurality of tasks, each task comprising a set of operations executable over a plurality of scan cycles independent from the other tasks; identifying the largest task requiring the most execution time of the identified tasks; identifying cut-points in the largest task where data dependency delays decouple the largest task; inserting delayed data dependencies into the cut-points of the largest task to create N pipeline sub-tasks, wherein N is a number of cores available to a processor on which the control program will be executed; and scheduling the pipeline sub-tasks to the available processor cores in a manner that enables multiple scan cycles of the largest task to be executed in parallel, wherein each pipeline sub-task is scheduled to a first core for a first scan cycle, and a parallel execution of each pipeline sub-task is scheduled to one or more second cores for a second scan cycle which is interleaved with the first scan cycle to generate an intermediate output for each pipeline sub-task on the one or more second cores simultaneous with a subsequent pipeline sub-task on the first core. 2. The method of claim 1 , wherein the data dependency analysis is performed on a low-level intermediate representation of the control program. 3. The method of claim 1 , further comprising identifying a read step, a process step, and a write step of the largest task wherein the cut-points are identified in the process step. 4. The method of claim 1 , wherein the pipeline sub-tasks are load-balanced. 5. The method of claim 1 , further comprising scheduling each pipeline sub-task in its own processor core, wherein output of a first pipeline sub-task is communicated to a processor core of a subsequent pipeline sub-task to be input to the subsequent pipeline sub-task, and wherein the first pipeline sub-task can be executed in its core while a second pipeline sub-task is executing to generate an intermediate output that can be communicated to the core of the subsequent pipeline sub-task. 6. The method of claim 1 , further comprising: using multiple cores of a multi-core processor to execute intermediate iterations of a first pipeline sub-task simultaneous with an iteration of a subsequent pipeline sub-task; privatizing memory locations of sensor data for every execution cycle of the pipeline sub-tasks to relax loop carried dependencies; committing a first computation result of an intermediate iteration to one of said privatized memory locations, wherein a most recent computation result is available to subsequent executions of the control program; and committing a second computation result of the largest task to a main memory after each pipeline sub-task of said largest task has finished executing. 7. A method for pipeline parallelizing a control program for execution on a multi-core processor comprising: translating source code of the control program into a lower-level intermediate representation; analyzing a workload of a process step of a largest task to identify one or more cut-points where the process step can be decoupled into stages; inserting a data dependency delay into each cut point to divide the largest task into pipelined stages; and scheduling the pipeline stages to the processor cores of the processor in a manner that enables multiple scan cycles of the largest task to be executed in parallel, wherein each pipelined stage is scheduled to a first core for a first scan cycle, and a parallel execution of each pipelined stage is scheduled to one or more second cores for a second scan cycle which is interleaved with the first scan cycle to generate an intermediate output for each pipelined stage on the one or more second cores simultaneous with a subsequent pipelined stage on the first core. 8. The method of claim 7 , wherein the scheduling is performed statically. 9. The method of claim 7 , wherein the scheduling is performed dynamically. 10. The method of claim 7 , further comprising: using data dependency analysis on the lower-level intermediate representation to identify tasks that can be performed in parallel; and identifying the largest task from the tasks that can be performed in parallel. 11. The method of claim 7 , further comprising: using multiple cores of a multi-core processor to execute intermediate iterations of a first pipeline stage simultaneous with an iteration of a subsequent pipeline stage; privatizing memory locations of sensor data for every execution cycle of the pipeline stages to relax loop carried dependencies; committing a first computation result of an intermediate iteration to one of said privatized memory locations, wherein a most recent computation result is available to subsequent executions of the control program; and committing a second computation result of the process step of the largest task to a main memory after each pipeline stage of said process step has finished executing. 12. A non-transitory program storage device readable by a computer, tangibly embodying a program of instructions executed by the computer to perform a method for pipeline parallelizing a control program for multi-core execution, the method comprising: performing a data dependency analysis on the control program to identify a plurality of tasks, each task comprising a set of operations executable over a plurality of scan cycles independent from the other tasks; identifying the largest task requiring the most execution time of the identified tasks; identifying cut-points in the largest task where data dependency delays decouple the largest task; inserting delayed data dependencies into the cut-points of the largest task to create N pipeline sub-tasks, wherein N is a number of cores available to a processor on which the control program will be executed; and scheduling the tasks and pipeline sub-tasks to the available processor cores in a manner that enables multiple scan cycles of the largest task to be executed in parallel, wherein each pipeline sub-task is scheduled to a first core for a first scan cycle, and a parallel execution of each pipeline sub-task is scheduled to one or more second cores for a second scan cycle which is interleaved with the first scan cycle to generate an intermediate output for each pipeline sub-task on the one or more second cores simultaneous with a subsequent pipeline sub-task on the first core. 13. The non-transitory program storage device of claim 12 , wherein the data dependency analysis is performed on a low-level intermediate representation of the control program. 14. The non-transitory program storage device of claim 12 , the method further comprising identifying a read step, a process step, and a write step of the largest task wherein the cut-points are identified in the process step. 15. The non-transitory program storage device of claim 12 , wherein the pipeline sub-tasks are load-balanced. 16. The non-transitory program storage device of claim 12 , the method further comprising scheduling each pipeline sub-task in its own processor core, wherein output of a first pipeline sub-task is communicated to a processor core of a subsequent pipeline sub-task to be input to the subsequent pipeline sub-task, and wherein the first pipeline sub-task can be executed in its core while a second pipeline sub-task is executing to generate an intermediate output that can be communicated to the core of the subsequent pipeline sub-task. 17. T
Code distribution (considering CPU load at run-time G06F9/505; load rebalancing G06F9/5083) · CPC title
Compilation · CPC title
Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title
considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration (scheduling strategies G06F9/4881 and subgroups) · CPC title
Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.