General purpose software parallel task engine
US-9477452-B2 · Oct 25, 2016 · US
US9875138B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9875138-B2 |
| Application number | US-201615225362-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 1, 2016 |
| Priority date | Mar 14, 2006 |
| Publication date | Jan 23, 2018 |
| Grant date | Jan 23, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A software engine for decomposing work to be done into tasks, and distributing the tasks to multiple, independent CPUs for execution is described. The engine utilizes dynamic code generation, with run-time specialization of variables, to achieve high performance. Problems are decomposed according to methods that enhance parallel CPU operation, and provide better opportunities for specialization and optimization of dynamically generated code. A specific application of this engine, a software three dimensional (3D) graphical image renderer, is described.
Opening claim text (preview).
What is claimed is: 1. A system comprising: a processor having multiple processing cores; and a parallel task engine for performing graphic processing tasks on data, the parallel task engine comprising: an input for receiving graphic processing tasks for rendering 3D objects; a scheduler for decomposing the tasks at run-time into one or more new tasks; and a run-time dynamic code generator for the new tasks, operation routines, the run-time dynamic code generator including a dynamic compiler, the dynamic compiler being adapted to output the operation routines for execution, wherein the scheduler further is for distributing and assigning the new tasks to multiple processing cores of the processor for performing in parallel the new tasks on at least a portion of the data by executing the dynamically compiled operation routines. 2. The system of claim 1 , wherein the 3D objects are defined as a collection of geometric primitives. 3. The system of claim 1 , wherein at least a portion of the scheduler operations of decomposing the tasks and the distributing and assigning the new tasks are dependent on an operating characteristic of the processor. 4. The system of claim 3 , wherein the operating characteristic of the processor is the number of processing cores of the processor. 5. The system of claim 1 , wherein the scheduler is configured to make run-time decomposition choices based on a quality of the operation routines generated by the dynamic compiler. 6. The system of claim 5 , wherein the quality of the operation routines is determined by performing one or more of: measuring characteristics of the operation routines, and obtaining statistics about the operation routines from the dynamic compiler. 7. The system of claim 1 , wherein the decomposing is dependent on at least one policy selected from a given set of policies, wherein the scheduler makes the selection of the at least one policy as a function of characteristics of the operation routines. 8. The system of claim 7 , wherein the scheduler selects the policy for decomposition which yields the highest estimated performance, based on an estimated performance of the operation routines. 9. The system of claim 7 , wherein the given set of policies includes: decomposing a task into one or more new tasks by partitioning the data on which the task is to be performed into one or more subsets of that data, each new task being responsible for performing the same operation as the original task on a corresponding data subset; decomposing a task into one or more new tasks, each of which performs a different operation than the original task, but which performs this operation on the same data set as the original task; and decomposing a task into one or more new tasks, by partitioning an individual datum of the data on which the task is to be performed, into sub-components, each new task creating one sub-component of each resulting datum for all the data. 10. The system of claim 1 , wherein the run-time dynamic code generator further comprises an optimizer, the optimizer taking as input an operation routine from the operation routines, or a pointer to an operation routine from the operation routines, the optimizer producing as output an output operation routine, or a pointer to the output operation routine, which is semantically equivalent to the operation routine at the input. 11. In a computer system having a processor, the processor having multiple processing cores, a method for performing graphics processing tasks on data, the method comprising: receiving graphics processing tasks for rendering 3D objects; decomposing the tasks at run-time into one or more new tasks; generating for the new tasks at run-time, operation routines, the generating including outputting the operation routines for execution using a dynamic compiler; distributing and assigning the new tasks to multiple processing cores of the processor; and the multiple processing cores performing the new tasks in parallel on at least part of the data by executing the operation routines. 12. The method of claim 11 , wherein the 3D objects are defined as a collection of geometric primitives. 13. The method of claim 11 , wherein the operating characteristic of the processor is the number of processing cores of the processor. 14. The method of claim 11 , wherein the decomposing is dependent on at least one policy selected from a given set of policies, the method further comprising making the selection of the at least one policy as a function of characteristics of an operating characteristic of the operation routines. 15. The method of claim 14 , further comprising selecting the policy for decomposition which yields the highest estimated performance, based on an estimated performance of the operation routines. 16. The method of claim 11 , wherein the decomposing the tasks is performed according to at least one of the following policies: decomposing a task into one or more new tasks by partitioning the data on which the task is to be performed into one or more subsets of that data, each new task being responsible for performing the same operation as the original task on a corresponding data subset; decomposing a task into one or more new tasks, each of which performs a different operation than the original task, but which performs this operation on the same data set as the original task; and decomposing a task into one or more new tasks, by partitioning an individual datum of the data on which the task is to be performed, into sub-components, each new task creating one sub-component of each resulting datum for all the data. 17. The method of claim 11 , wherein the 3D objects are defined as a collection of geometric primitives. 18. The method of claim 17 , wherein the graphics processing tasks include pixel processing tasks that draw the 3D objects to a rendered image, wherein the decomposing includes decomposing the pixel processing tasks into one or more new pixel processing tasks whereby at least two of the new pixel processing tasks contain fragments of non-overlapping regions in the rendered image, and the new pixel processing tasks are assigned to at least two job loops. 19. The method of claim 11 , wherein the 3D objects are defined as a collection of geometric primitives and wherein the decomposing is dependent on a policy of drawing primitives in a first-in, first-out order.
General purpose rendering architectures · CPC title
Constructive solid geometry [CSG] using solid primitives, e.g. cylinders, cubes · CPC title
Program synchronisation; Mutual exclusion, e.g. by means of semaphores · CPC title
Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.