Compilation for node device GPU-based parallel processing

US9760376B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9760376-B1
Application numberUS-201715422285-A
CountryUS
Kind codeB1
Filing dateFeb 1, 2017
Priority dateFeb 1, 2016
Publication dateSep 12, 2017
Grant dateSep 12, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus may include a processor and storage to store instructions that cause the processor to perform operations including: in response to a determination that a GPU of a node device is available, determine whether a task routine can be compiled to generate a GPU task routine for execution by the GPU to cause performance of multiple instances of a task of the task routine at least partially in parallel without dependencies thereamong; and in response to a determination that the task routine is able to be compiled to generate the GPU task routine: employ a conversion rule to convert the task routine into the GPU task routine; compile the GPU task routine for execution by the GPU; and assign performance of the task with a data set partition to the node device to enable performance of the multiple instances with the data set partition by the GPU.

First claim

Opening claim text (preview).

The invention claimed is: 1. An apparatus comprising a processor and a storage to store instructions that, when executed by the processor, cause the processor to perform operations comprising: analyze a current status of resources of at least one node device of a plurality of node devices to determine an availability of at least one graphics processing unit (GPU) of the at least one node device to be assigned to perform a first task of an analysis routine, wherein: operation of the plurality of node devices is coordinated to perform tasks of analysis routines at least partially in parallel; the analysis routine is generated for execution by at least one central processing unit (CPU) of the at least one node; and the resources of the at least one node device are selected from a group consisting of the at least one CPU, the at least one GPU, and storage space within at least one storage of the at least one node device; in response to a determination that the at least one GPU is available to be assigned to perform the first task of the analysis routine: analyze a first task routine of the analysis routine to determine whether the first task routine is able to be compiled to generate a GPU task routine for execution by the at least one GPU to cause the at least one GPU to perform multiple instances of the first task of the analysis routine at least partially in parallel without a dependency among inputs and outputs of the multiple instances of the first task, wherein: the first task routine is generated for execution by the at least one CPU to perform the first task of the analysis routine; and the determination of whether the first task routine is able to be compiled to generate the GPU task routine comprises a determination of whether the first task routine includes an instruction that prevents the compilation to generate the GPU task routine and a determination of whether inputs and outputs of the first task routine are defined to not require the dependency; and in response to a determination that the first task routine is able to be compiled to generate the GPU task routine: assign a data set partition of a plurality of data set partitions of a data set to the at least one node device to enable access to the data set partition by the at least one GPU; employ a conversion rule to convert at least one instruction of the first task routine into at least one corresponding instruction of the GPU task routine; compile the at least one corresponding instruction of the GPU task routine for execution by the at least one GPU; and assign a performance of the first task of the analysis routine with the data set partition to the at least one node device to enable performance of the multiple instances of the first task with the data set partition by the at least one GPU. 2. The apparatus of claim 1 , wherein to determine whether the first task routine includes an instruction that prevents the compilation to generate the GPU task routine, the processor is caused to: determine whether the instruction of the first task routine is included in a set of instructions that cannot be converted into at least one instruction able to be executed by the at least one GPU; and in response to a determination that the instruction of the first task routine is not included in the set of instructions, determine whether the instruction of the first task routine is used in the first task routine in a manner that prevents conversion into at least one instruction able to be executed by the at least one GPU. 3. The apparatus of claim 1 , wherein to convert the at least one instruction of the first task routine into the at least one corresponding instruction of the GPU task routine, the processor is caused to convert the at least one instruction of the first task routine from a first programming language into the at least one corresponding instruction in a second programming language in accordance with the conversion rule. 4. The apparatus of claim 1 , wherein: the at least one storage of the at least one node device comprises a first volatile storage communicatively coupled to the at least one CPU, and a second volatile storage communicatively coupled to the at least one GPU; assigning the data set partition to the at least one node device to enable access by to the data set partition by the at least one GPU comprises causing the data set partition to be stored within the second volatile storage; and in response to a determination that the at least one GPU is not available to be assigned to perform the first task of the analysis routine, the processor is caused to perform operations comprising: refrain from analyzing the first task routine to determine whether the first task routine is able to be compiled to generate the GPU task routine; assign the data set partition to the at least one node device to cause storage of the data set partition within the first volatile storage to enable access to the data set partition by the at least one CPU; compile the first task routine for execution by the at least one CPU; and assign the performance of the first task of the analysis routine with the data set partition to the at least one node device to enable performance of the first task with the data set partition by the at least one CPU. 5. The apparatus of claim 1 , wherein: the apparatus comprises a coordinating device that coordinates the operation of the plurality of node devices; the processor is caused to recurringly receive updates to the current status from each node device of the plurality of node devices; and to analyze the current status to determine availability of the at least one GPU of the at least one node device, the processor is caused to identify a node device of the plurality of node devices that incorporates a GPU indicated by the current status as available. 6. The apparatus of claim 5 , wherein to assign the data set partition of the data set to the at least one node device, the processor is caused to perform operations comprising: analyze a metadata indicative of structural features of the data set to identify a restriction in a manner in which the data set is able to be divided into the plurality of data set partitions, wherein the restriction is selected from a group consisting of an indication of a smallest atomic unit of data within the data set, and a specification of a partitioning scheme; and derive a division the data set into the plurality of data set partitions based at least partially on the restriction. 7. The apparatus of claim 6 , wherein the processor is caused to perform operations comprising: retrieve the metadata from at least one storage device at which the data set is stored; and transmit an indication of the assignment of the data set partition to the at least one node device or the at least one storage device to cause a transmission of the data set partition from the at least one storage device to the at least one node device. 8. The apparatus of claim 1 , wherein: the apparatus comprises a node device of the at least one node device; the node device comprises a GPU of the at least one GPU; the processor comprises a CPU of the at least one CPU; and to analyze the current status to determine availability of the at least one GPU of the at least one node device, the CPU is caused to determine whether the GPU of the node device is indicated by the current status as available. 9. The apparatus of claim 1 , wherein the processor is caused to perform operations comprising: analyze a second task routine of the analysis routine to determine whether the second task routine is able to be compiled to generate another GPU task routine for execution by the at least one GPU to cause the at least one GPU to per

Assignees

Inventors

Classifications

  • Task transfer initiation or dispatching · CPC title

  • using a plurality of independent parallel functional units · CPC title

  • Instruction analysis, e.g. decoding, instruction word fields · CPC title

  • Instruction operation extension or modification · CPC title

  • LOAD or STORE instructions; Clear instruction · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9760376B1 cover?
An apparatus may include a processor and storage to store instructions that cause the processor to perform operations including: in response to a determination that a GPU of a node device is available, determine whether a task routine can be compiled to generate a GPU task routine for execution by the GPU to cause performance of multiple instances of a task of the task routine at least partiall…
Who is the assignee on this patent?
Sas Inst Inc
What technology area does this patent fall under?
Primary CPC classification G06F9/30145. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 12 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).