Multi-processor graphics rendering
US-9697579-B2 · Jul 4, 2017 · US
US9760376B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9760376-B1 |
| Application number | US-201715422285-A |
| Country | US |
| Kind code | B1 |
| Filing date | Feb 1, 2017 |
| Priority date | Feb 1, 2016 |
| Publication date | Sep 12, 2017 |
| Grant date | Sep 12, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus may include a processor and storage to store instructions that cause the processor to perform operations including: in response to a determination that a GPU of a node device is available, determine whether a task routine can be compiled to generate a GPU task routine for execution by the GPU to cause performance of multiple instances of a task of the task routine at least partially in parallel without dependencies thereamong; and in response to a determination that the task routine is able to be compiled to generate the GPU task routine: employ a conversion rule to convert the task routine into the GPU task routine; compile the GPU task routine for execution by the GPU; and assign performance of the task with a data set partition to the node device to enable performance of the multiple instances with the data set partition by the GPU.
Opening claim text (preview).
The invention claimed is: 1. An apparatus comprising a processor and a storage to store instructions that, when executed by the processor, cause the processor to perform operations comprising: analyze a current status of resources of at least one node device of a plurality of node devices to determine an availability of at least one graphics processing unit (GPU) of the at least one node device to be assigned to perform a first task of an analysis routine, wherein: operation of the plurality of node devices is coordinated to perform tasks of analysis routines at least partially in parallel; the analysis routine is generated for execution by at least one central processing unit (CPU) of the at least one node; and the resources of the at least one node device are selected from a group consisting of the at least one CPU, the at least one GPU, and storage space within at least one storage of the at least one node device; in response to a determination that the at least one GPU is available to be assigned to perform the first task of the analysis routine: analyze a first task routine of the analysis routine to determine whether the first task routine is able to be compiled to generate a GPU task routine for execution by the at least one GPU to cause the at least one GPU to perform multiple instances of the first task of the analysis routine at least partially in parallel without a dependency among inputs and outputs of the multiple instances of the first task, wherein: the first task routine is generated for execution by the at least one CPU to perform the first task of the analysis routine; and the determination of whether the first task routine is able to be compiled to generate the GPU task routine comprises a determination of whether the first task routine includes an instruction that prevents the compilation to generate the GPU task routine and a determination of whether inputs and outputs of the first task routine are defined to not require the dependency; and in response to a determination that the first task routine is able to be compiled to generate the GPU task routine: assign a data set partition of a plurality of data set partitions of a data set to the at least one node device to enable access to the data set partition by the at least one GPU; employ a conversion rule to convert at least one instruction of the first task routine into at least one corresponding instruction of the GPU task routine; compile the at least one corresponding instruction of the GPU task routine for execution by the at least one GPU; and assign a performance of the first task of the analysis routine with the data set partition to the at least one node device to enable performance of the multiple instances of the first task with the data set partition by the at least one GPU. 2. The apparatus of claim 1 , wherein to determine whether the first task routine includes an instruction that prevents the compilation to generate the GPU task routine, the processor is caused to: determine whether the instruction of the first task routine is included in a set of instructions that cannot be converted into at least one instruction able to be executed by the at least one GPU; and in response to a determination that the instruction of the first task routine is not included in the set of instructions, determine whether the instruction of the first task routine is used in the first task routine in a manner that prevents conversion into at least one instruction able to be executed by the at least one GPU. 3. The apparatus of claim 1 , wherein to convert the at least one instruction of the first task routine into the at least one corresponding instruction of the GPU task routine, the processor is caused to convert the at least one instruction of the first task routine from a first programming language into the at least one corresponding instruction in a second programming language in accordance with the conversion rule. 4. The apparatus of claim 1 , wherein: the at least one storage of the at least one node device comprises a first volatile storage communicatively coupled to the at least one CPU, and a second volatile storage communicatively coupled to the at least one GPU; assigning the data set partition to the at least one node device to enable access by to the data set partition by the at least one GPU comprises causing the data set partition to be stored within the second volatile storage; and in response to a determination that the at least one GPU is not available to be assigned to perform the first task of the analysis routine, the processor is caused to perform operations comprising: refrain from analyzing the first task routine to determine whether the first task routine is able to be compiled to generate the GPU task routine; assign the data set partition to the at least one node device to cause storage of the data set partition within the first volatile storage to enable access to the data set partition by the at least one CPU; compile the first task routine for execution by the at least one CPU; and assign the performance of the first task of the analysis routine with the data set partition to the at least one node device to enable performance of the first task with the data set partition by the at least one CPU. 5. The apparatus of claim 1 , wherein: the apparatus comprises a coordinating device that coordinates the operation of the plurality of node devices; the processor is caused to recurringly receive updates to the current status from each node device of the plurality of node devices; and to analyze the current status to determine availability of the at least one GPU of the at least one node device, the processor is caused to identify a node device of the plurality of node devices that incorporates a GPU indicated by the current status as available. 6. The apparatus of claim 5 , wherein to assign the data set partition of the data set to the at least one node device, the processor is caused to perform operations comprising: analyze a metadata indicative of structural features of the data set to identify a restriction in a manner in which the data set is able to be divided into the plurality of data set partitions, wherein the restriction is selected from a group consisting of an indication of a smallest atomic unit of data within the data set, and a specification of a partitioning scheme; and derive a division the data set into the plurality of data set partitions based at least partially on the restriction. 7. The apparatus of claim 6 , wherein the processor is caused to perform operations comprising: retrieve the metadata from at least one storage device at which the data set is stored; and transmit an indication of the assignment of the data set partition to the at least one node device or the at least one storage device to cause a transmission of the data set partition from the at least one storage device to the at least one node device. 8. The apparatus of claim 1 , wherein: the apparatus comprises a node device of the at least one node device; the node device comprises a GPU of the at least one GPU; the processor comprises a CPU of the at least one CPU; and to analyze the current status to determine availability of the at least one GPU of the at least one node device, the CPU is caused to determine whether the GPU of the node device is indicated by the current status as available. 9. The apparatus of claim 1 , wherein the processor is caused to perform operations comprising: analyze a second task routine of the analysis routine to determine whether the second task routine is able to be compiled to generate another GPU task routine for execution by the at least one GPU to cause the at least one GPU to per
Task transfer initiation or dispatching · CPC title
using a plurality of independent parallel functional units · CPC title
Instruction analysis, e.g. decoding, instruction word fields · CPC title
Instruction operation extension or modification · CPC title
LOAD or STORE instructions; Clear instruction · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.