Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program to multiple parallel threads

US9672019B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9672019-B2
Application numberUS-97855710-A
CountryUS
Kind codeB2
Filing dateDec 25, 2010
Priority dateNov 24, 2008
Publication dateJun 6, 2017
Grant dateJun 6, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program into multiple parallel threads are described. In some embodiments, the systems and apparatuses execute a method of original code decomposition and/or generated thread execution.

First claim

Opening claim text (preview).

We claim: 1. A method comprising: executing original code on a first processor core; and during execution of the original code, placing a second processor core into a detect phase, wherein in the detect phase the second processor core is to detect an entry point in the original code running on the first processor core, indicating to switch into a different, cooperative execution mode with the first processor core, wherein the entry point is a beginning point in the original code which corresponds to a part of dynamic execution of the original code, profiling the original code in the first processor core, generating, in the second processor core, cooperative code from the original code to be cooperatively executed by the first and second processor cores, wherein the cooperative code is a threaded version of the original code along with possible entry points, detecting, by the second processor core, the entry point, and executing the generated cooperative code in the first and second processor cores. 2. The method of claim 1 , further comprising: arming the first processor core to enter into a different execution mode upon hitting the indication to switch. 3. The method of claim 1 , wherein profiling the original code comprises gathering information about loads, stores, and branches for a set amount of instructions. 4. The method of claim 1 , further comprising: halting execution of the generated cooperative code in the first and second processor cores upon a successful completion of the generated cooperative code. 5. The method of claim 1 , wherein executing the generated cooperative code in the first and second processor cores comprises: executing two threads in separation; buffering memory loads and stores using wrapper hardware; checking the buffered memory loads and stores for possible violations; and atomically committing a state to provide forward process while maintaining memory ordering. 6. The method of claim 5 , further comprising: halting execution of the generated cooperative code in the first and second processor cores upon a violation and rolling back to a last commit point. 7. The method of claim 6 , further comprising: upon halting execution of the generated cooperative code in the first and second processor cores upon a violation, executing the original code in the first processor core, and placing the second processor core into a detect phase, wherein in the detect phase the second processor core is to detect an indication to switch into a different, cooperative execution mode with the first processor core. 8. An apparatus comprising: a first processor core and a second processor core to execute cooperative code upon a detection of an entry point in original code running on the first processor core, wherein the entry point is a beginning point in the original code which corresponds to a part of dynamic execution of the original code and wherein the cooperative code is a threaded version of the original code along with possible entry points; and a hardware wrapper to: detect a hot region of the original code, wherein a hot region of code is a portion of code which corresponds to the part of dynamic execution of the original code, profile the hot region of code of the original code, in the first processor core, to generate the cooperative code, buffer memory loads and stores executed by the first and second processing cores, check the buffered memory loads and stores for possible violations, and atomically commit a state to provide forward progress while maintaining memory ordering. 9. The apparatus of claim 8 , further comprising: a mid-level cache to merge an execution state of the cooperative code. 10. The apparatus of claim 8 , further comprising: a last level cache. 11. The apparatus of claim 8 , wherein the hardware wrapper is to discard the buffered memory loads and stores upon an abort. 12. The apparatus of claim 11 , wherein the abort is found upon a store or store violation. 13. The apparatus of claim 11 , wherein the abort is found upon a load or store violation. 14. The apparatus of claim 11 , wherein upon the abort the first processing core is rolled back to a last commit point. 15. The apparatus of claim 8 , wherein the first processing core is to execute the original code until the entry point is reached. 16. The apparatus of claim 8 , wherein the first processing core is armed after the hardware wrapper has profiled the original code to enter into a different execution mode upon hitting an indication to switch. 17. The apparatus of claim 8 , wherein the hardware wrapper is to profile the original code by gathering information about loads, stores, and branches for a set amount of instructions. 18. The apparatus of claim 8 , wherein the hardware wrapper is to detect the hot region of the original code by detecting an instruction pointer of the hot region in a hardware table of accessed hot region instruction pointers.

Assignees

Inventors

Classifications

  • Speculative instruction execution · CPC title

  • by runtime analysis (performance monitoring G06F11/3466) · CPC title

  • by tracing the execution of the program · CPC title

  • G06F8/4442Primary

    Reducing the number of cache misses; Data prefetching (cache prefetching G06F12/0862) · CPC title

  • Interprogram communication · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9672019B2 cover?
Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program into multiple parallel threads are described. In some embodiments, the systems and apparatuses execute a method of original code decomposition and/or generated thread execution.
Who is the assignee on this patent?
Sager David J, Sasanka Ruchira, Gabor Ron, and 14 more
What technology area does this patent fall under?
Primary CPC classification G06F8/4442. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 06 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).