Data layout transformation for workload distribution

US9720708B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9720708-B2
Application numberUS-201113214083-A
CountryUS
Kind codeB2
Filing dateAug 19, 2011
Priority dateAug 19, 2011
Publication dateAug 1, 2017
Grant dateAug 1, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are disclosed relating to data transformation for distributing workloads between processors or cores within a processor. In various embodiments, a first processing element receives a set of bytecode. The set of bytecode specifies a set of tasks and a first data structure that specifies data to be operated on during performance of the set of tasks. The first data structure is stored non-contiguously in memory of the computer system. In response to determining to offload the set of tasks to a second processing element of the computer system, the first processing element generates a second data structure that specifies the data. The second data structure is stored contiguously in memory of the computer system. The first processing element provides the second data structure to the second processing element for performance of the set of tasks.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory computer readable medium having program instructions stored thereon that are executable on a first processing element of a computer system to perform: receiving a set of bytecode and a first data structure, wherein the set of bytecode specifies a set of tasks, and wherein the first data structure includes data to be operated on during performance of the set of tasks; reifying the set of bytecode to produce an intermediary form of the set of bytecode; determining whether to offload the set of tasks to a second processing element of the computer system; in response to determining to offload the set of tasks, determining whether the first data structure includes data stored contiguously or non-contiguously in memory of the computer system; in response to determining that the data in the first data structure is stored non-contiguously: identifying, using the intermediate form, data values associated with fields of the first data structure; extracting the data values from the first data structure; and generating a second data structure that includes the extracted data values, wherein the second data structure is an array of the data values and is stored contiguously in the memory of the computer system; and providing the second data structure to the second processing element for performance of the set of tasks. 2. The computer readable medium of claim 1 , wherein the first data structure includes an array of pointers to the included data. 3. The computer readable medium of claim 1 , wherein generating the second data structure includes: requesting an allocation of contiguous memory locations for the second data structure; and inserting the identified data values into the contiguous memory locations for the second data structure. 4. The computer readable medium of claim 1 , wherein the program instructions are further executable to create a plurality of threads, and to use the plurality of threads to extract the identified data values from the first data structure. 5. The computer readable medium of claim 1 , wherein the program instructions are further executable to perform: in response to determining to offload the set of tasks to the second processing element, generating a set of domain-specific instructions from the set of bytecode, wherein the set of domain-specific instructions have a domain-specific language format and specify layout information for the second data structure; and providing the set of domain-specific instructions to a driver executable to generate a set of instructions for the second processing element to perform the set of tasks using the data. 6. The computer readable medium of claim 1 , wherein the first data structure includes other data that is unrelated to the performance of the offloaded tasks, and wherein the program instructions are further executable to determine whether data included in the first data structure is relevant to the performance of the offloaded tasks. 7. The computer readable medium of claim 1 , wherein the program instructions are further executable to perform: receiving a set of results from the second processing element for performance of the set of tasks; and updating data included in the first data structure based on the received set of results. 8. The computer readable medium of claim 1 , wherein the program instructions are further executable to prevent a garbage collector from reallocating memory locations used by the first and second data structures while the set of tasks are being performed by the second processing element. 9. The computer readable medium of claim 1 , wherein the program instructions are interpretable by a control program on the first processing element to produce instructions within an instruction set architecture (ISA) of the first processing element, and wherein the control program is executable to implement a virtual machine. 10. The computer readable medium of claim 1 , wherein the second processing element is a graphics processor. 11. A method, comprising: a first processing element receiving a set of instructions specifying a data parallel problem and a first data structure having a set of data values to be operated on during performance of the data parallel problem; reifying the set of instructions to produce an intermediary form of the instructions; determining that the set of data values of the first data structure is stored non-contiguously in a memory of a computer system; in response to the determining and determining to offload the data parallel problem to a second processing element, the first processing element generating a second data structure having the set of data values, wherein the second data structure is an array stored contiguously in the memory, and wherein the generating includes: using the intermediary form to identify data values in the first data structure; and extracting the identified data values to include in the second data structure; and the first processing element providing the second data structure to the second processing element for performance of the data parallel problem. 12. The method of claim 11 , wherein the providing includes providing an address of the set of data values to a driver associated with the second processing element. 13. The method of claim 11 , further comprising: the first processing element executing an interpreter to interpret at least a portion of the set of instructions. 14. The method of claim 11 , wherein the first and second processing elements are separate cores with in a processor. 15. A non-transitory computer readable medium, comprising: source program instructions of a library routine that are compilable by a compiler for inclusion in compiled code as a compiled library routine; wherein the compiled library routine is executable on a first processing element of a computer system to perform: receiving a first set of bytecode, wherein the first set of bytecode specifies a set of tasks and a first data structure that specifies data to be operated on during performance of the set of tasks; reifying the first set of bytecode to produce an intermediary form of the first set of bytecode; determining that the data specified by the first data structure is stored non-contiguously in memory of the computer system; in response to the determining and determining to offload the set of tasks to a second processing element of the computer system, generating a second data structure that includes the data, wherein the second data structure is stored contiguously in memory of the computer system, and wherein the generating includes: analyzing the intermediary form to identify data values included in fields of the first data structure; extracting the data values from the fields; and including the extracted data values in the second data structure; and providing the second data structure to the second processing element for performance of the set of tasks. 16. The computer readable medium of claim 15 , wherein the compiled library routine is interpretable by a virtual machine for the first processing element, wherein the virtual machine is executable to interpret compiled instructions to produce instructions within an instruction set architecture (ISA) of the first processing element. 17. The computer readable medium of claim 15 , wherein the first set of bytecode specifies the set of tasks by extending a base class defined in the library routine, and wherein the extend class specifies the first data structure.

Assignees

Inventors

Classifications

  • Offload · CPC title

  • G06F9/445Primary

    Program loading or initiating (bootstrapping G06F9/4401; security arrangements for program loading or initiating G06F21/57) · CPC title

  • considering software capabilities, i.e. software resources associated or available to the machine · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9720708B2 cover?
Techniques are disclosed relating to data transformation for distributing workloads between processors or cores within a processor. In various embodiments, a first processing element receives a set of bytecode. The set of bytecode specifies a set of tasks and a first data structure that specifies data to be operated on during performance of the set of tasks. The first data structure is stored n…
Who is the assignee on this patent?
Caspole Eric R, Advanced Micro Devices Inc
What technology area does this patent fall under?
Primary CPC classification G06F9/445. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 01 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).