Method for compiling a parallel thread execution program for general execution

US9361079B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9361079-B2
Application numberUS-201213361408-A
CountryUS
Kind codeB2
Filing dateJan 30, 2012
Priority dateJan 30, 2012
Publication dateJun 7, 2016
Grant dateJun 7, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A technique is disclosed for executing a compiled parallel application on a general purpose processor. The compiled parallel application comprises parallel thread execution code, which includes single-instruction multiple-data (SIMD) constructs, as well as references to intrinsic functions conventionally available in a graphics processing unit. The parallel thread execution code is transformed into an intermediate representation, which includes vector instruction constructs. The SIMD constructs are mapped to vector instructions available within the intermediate representation. Intrinsic functions are mapped to corresponding emulated runtime implementations. The technique advantageously enables parallel applications compiled for execution on a graphics processing unit to be executed on a general purpose central processing unit configured to support vector instructions.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for executing a multi-threaded program on a general purpose processor, the method comprising: translating the multi-threaded program into an intermediate representation including at least one parallel function; partitioning each parallel function within the intermediate representation into at least one operation group; classifying each operation group as either scalar or vectorizable; transforming each vectorizable operation group into vector instructions corresponding to computations performed by two or more threads when executing the multi-threaded program; binding each of one or more intrinsic functions included within the multi-threaded program to a different runtime implementation corresponding to the intrinsic function; and generating, based on the vector instructions, native executable code for the general purpose processor to process. 2. The method of claim 1 , wherein the multi-threaded program comprises a plurality of single-instruction multiple-data instruction constructs, and the intermediate representation comprises a plurality of vector instruction constructs. 3. The method of claim 2 , wherein the intermediate representation is defined by a low-level virtual machine instruction set architecture. 4. The method of claim 1 , wherein at least one of the one or more intrinsic function comprises a built-in variable access function, a transcendental math operation, a thread synchronization operation, a texture sampling operation, or an atomic memory access operation. 5. The method of claim 1 , wherein generating native executable code comprises transforming each intermediate representation into at least one corresponding processor-dependent machine instruction. 6. The method of claim 5 , wherein generating native executable code further comprises transforming at least one intrinsic function call residing within the multi-threaded program into a runtime function call. 7. The method of claim 5 , wherein generating native executable code further comprises transforming the at least one vectorizable operation into an equivalent loop construct within the native executable code. 8. The method of claim 7 , wherein the equivalent loop construct includes at least one vector instruction configured to perform a particular operation on at least two different pairs of input data. 9. The method of claim 8 , wherein the at least one vector instruction comprises a streaming single-instruction, multiple-data extension instruction. 10. A non-transitory computer-readable storage medium including instructions that, when executed by a processor, cause the processor to execute a multi-threaded program, by performing the steps of: translating the multi-threaded program into an intermediate representation including at least one parallel function; partitioning each parallel function within the intermediate representation into at least one operation group; classifying each operation group as either scalar or vectorizable; transforming each vectorizable operation group into vector instructions corresponding to computations performed by two or more threads when executing the multi-threaded program; binding each of one or more intrinsic functions included within the multi-threaded program to a different runtime implementation corresponding to the intrinsic function; and generating, based on the vector instructions, native executable code for the general purpose processor to process. 11. The non-transitory computer-readable storage medium of claim 10 , wherein the multi-threaded program comprises a plurality of single-instruction multiple-data instruction constructs, and the intermediate representation comprises a plurality of vector instruction constructs. 12. The non-transitory computer-readable storage medium of claim 11 , wherein the intermediate representation is defined by a low-level virtual machine instruction set architecture. 13. The non-transitory computer-readable storage medium of claim 10 , wherein at least one of the one or more intrinsic functions comprise a built-in variable access function, a transcendental math operation, a thread synchronization operation, a texture sampling operation, or an atomic memory access operation. 14. The non-transitory computer-readable storage medium of claim 10 , wherein generating native executable code comprises transforming each intermediate representation into at least one corresponding processor-dependent machine instruction. 15. The non-transitory computer-readable storage medium of claim 14 , wherein generating native executable code further comprises transforming at least one intrinsic function call residing within the multi-threaded program into a runtime function call. 16. The non-transitory computer-readable storage medium of claim 14 , wherein generating native executable code further comprises transforming the at least one vectorizable operation into an equivalent loop construct within the native executable code. 17. The non-transitory computer-readable storage medium of claim 16 , wherein the equivalent loop construct includes at least one vector instruction configured to perform a particular operation on at least two different pairs of input data. 18. The non-transitory computer-readable storage medium of claim 17 , wherein the at least one vector instruction comprises a streaming single-instruction, multiple-data extension instruction. 19. A computing device comprising: a memory that stores a code translator; and a processor that is coupled to the memory and, upon executing the code translator, is configured to: translate a multi-threaded program into an intermediate representation; partition the intermediate representation into one or more groups of parallel operations associated with a kernel function; classify operations within the intermediate representation as either scalar operations or vectorizable operations, wherein each group of parallel operations comprises a set of vectorizable operations; transform each set of vectorizable operations into a loop construct corresponding to operations performed on equivalent data performed by two or more threads when executing the multi-threaded program; and generate, based on the loop constructs, native executable code for the processor to process. 20. The system of claim 19 , wherein the processor is further configured to bind one or more intrinsic functions specified in the multi-threaded program to corresponding run-time implementations. 21. The system of claim 20 , wherein generating the native executable code is further based on the corresponding run-time implementations.

Assignees

Inventors

Classifications

  • G06F8/53Primary

    Decompilation; Disassembly · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9361079B2 cover?
A technique is disclosed for executing a compiled parallel application on a general purpose processor. The compiled parallel application comprises parallel thread execution code, which includes single-instruction multiple-data (SIMD) constructs, as well as references to intrinsic functions conventionally available in a graphics processing unit. The parallel thread execution code is transformed …
Who is the assignee on this patent?
Grover Vinod, Kerr Andrew, Lee Sean, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F8/53. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 07 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).