Method for transforming a multithreaded program for general execution

US9367306B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9367306-B2
Application numberUS-201113076258-A
CountryUS
Kind codeB2
Filing dateMar 30, 2011
Priority dateMar 30, 2011
Publication dateJun 14, 2016
Grant dateJun 14, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A technique is disclosed for executing a program designed for multi-threaded operation on a general purpose processor. Original source code for the program is transformed from a multi-threaded structure into a computationally equivalent single-threaded structure. A transform operation modifies the original source code to insert code constructs for serial thread execution. The transform operation also replaces synchronization barrier constructs in the original source code with synchronization barrier code that is configured to facilitate serialization. The transformed source code may then be conventionally compiled and advantageously executed on the general purpose processor.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for executing a multi-threaded program on a single-threaded processor core, the method comprising: identifying a kernel function included within the multi-threaded program; enumerating a plurality of barrier synchronization calls within the kernel function; modifying the kernel function by replacing each enumerated barrier synchronization call within the kernel function with a plurality of barrier commands and inserting a plurality of execution control commands into the kernel function, wherein the plurality of execution control commands includes a switch command within a nested loop, the switch command configured to include a different case for each of the enumerated barrier synchronization calls; and transferring the modified kernel function to a transformed source file. 2. The method of claim 1 , wherein the execution control commands comprise at least one of a while-loop, a nested for-loop, and a switch command within the nested for-loop. 3. The method of claim 2 , wherein the while-loop is configured to execute one iteration for each enumerated barrier synchronization call within the kernel function. 4. The method of claim 3 , wherein the nested for-loop is configured to execute a number of times within each iteration of the while-loop equal to the number of threads that are supposed to execute the kernel function concurrently. 5. The method of claim 4 , wherein the switch command includes a different case for each enumerated barrier synchronization call within the kernel function. 6. The method of claim 5 , wherein the execution control commands further comprise a first variable configured to store a current starting point for execution, and second variable configured to store a next starting point for execution, and wherein the first variable and the second variable are updated for each iteration of the while-loop. 7. The method of claim 6 , wherein the barrier commands related to a particular enumerated barrier synchronization call within the kernel function comprise an assignment to the second variable corresponding to an enumeration number for a barrier label, a thread-end goto command, and the barrier label. 8. The method of claim 7 , wherein the enumeration number corresponds to a case in the switch command associated with the particular enumerated barrier synchronization call within the kernel function and having an associated goto command that targets the barrier label. 9. The method of claim 7 , wherein the thread-end goto command jumps to a label in the nested for-loop that advances the nested for-loop by one iteration. 10. The method of claim 1 , further comprising compiling at least the transformed source file to generate an executable object for execution on the single-threaded processor core. 11. A non-transitory computer-readable storage medium including instructions that, when executed by a processing unit, cause the processing unit to execute a multi-threaded program on a single-threaded processor core, by performing the steps of: identifying a kernel function included within the multi-threaded program; enumerating a plurality of barrier synchronization calls within the kernel function; modifying the kernel function by replacing each enumerated barrier synchronization call within the kernel function with a plurality of barrier commands and inserting a plurality of execution control commands into the kernel function, wherein the plurality of execution control commands includes a switch command within a nested loop, the switch command configured to include a different case for each of the enumerated barrier synchronization calls; and transferring the modified kernel function to a transformed source file. 12. The non-transitory computer-readable storage medium of claim 11 , wherein the execution control commands comprise at least one of a while-loop, a nested for-loop, and a switch command within the nested for-loop. 13. The non-transitory computer-readable storage medium of claim 12 , wherein the while-loop is configured to execute one iteration for each enumerated barrier synchronization call within the kernel function. 14. The non-transitory computer-readable storage medium of claim 13 , wherein the nested for-loop is configured to execute a number of times within each iteration of the while-loop equal to the number of threads that are supposed to execute the kernel function concurrently. 15. The non-transitory computer-readable storage medium of claim 14 , wherein the switch command includes a different case for each enumerated barrier synchronization call within the kernel function. 16. The non-transitory computer-readable storage medium of claim 15 , wherein the execution control commands further comprise a first variable configured to store a current starting point for execution, and second variable configured to store a next starting point for execution, and wherein the first variable and the second variable are updated for each iteration of the while-loop. 17. The non-transitory computer-readable storage medium of claim 16 , wherein the barrier commands related to a particular enumerated barrier synchronization call within the kernel function comprise an assignment to the second variable corresponding to an enumeration number for a barrier label, a thread-end goto command that advances the nested for-loop by one iteration, and the barrier label. 18. The non-transitory computer-readable storage medium of claim 17 , wherein the enumeration number corresponds to a case in the switch command associated with the particular enumerated barrier synchronization call within the kernel function and having an associated goto command that targets the barrier label. 19. The non-transitory computer-readable storage medium of claim 11 , further comprising compiling at least the transformed source file to generate an executable object for execution on the single-threaded processor core. 20. A computing device, comprising: a mass storage system configured to store at least a multi-threaded program and a transformed source file; a processing unit coupled to the mass storage system and configured to: identify a kernel function included within the multi-threaded program; enumerate a plurality of barrier synchronization calls within the kernel function; modify the kernel function by replacing each enumerated barrier synchronization call within the kernel function with a plurality of barrier commands and inserting a plurality of execution control commands into the kernel function, wherein the plurality of execution control commands includes a switch command within a nested loop, the switch command configured to include a different case for each of the enumerated barrier synchronization calls; and transfer the modified kernel function to the transformed source file. 21. The computing device of claim 20 , wherein the execution control commands comprise at least one of a while-loop, a nested for-loop, and a switch command within the nested for-loop that is configured to include a different case for each enumerated barrier synchronization call within the kernel function. 22. A computing device, comprising: a mass storage system configured to store at least a multi-threaded program and a transformed source file; a processing unit coupled to the mass storage system and configured to: identify a kernel function included within the multi-threaded program; enumerate a plurality of barrier synchronization calls within t

Assignees

Inventors

Classifications

  • G06F8/72Primary

    Code refactoring · CPC title

  • Barrier synchronisation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9367306B2 cover?
A technique is disclosed for executing a program designed for multi-threaded operation on a general purpose processor. Original source code for the program is transformed from a multi-threaded structure into a computationally equivalent single-threaded structure. A transform operation modifies the original source code to insert code constructs for serial thread execution. The transform operatio…
Who is the assignee on this patent?
Marathe Jaydeep, Grover Vinod, Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06F8/72. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 14 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).