Adaptive Contention-Aware Thread Placement for Parallel Runtime Systems

US2016246647A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016246647-A1
Application numberUS-201514626754-A
CountryUS
Kind codeA1
Filing dateFeb 19, 2015
Priority dateFeb 19, 2015
Publication dateAug 25, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An adaptive contention-aware thread scheduler may place software threads for pairs of application on the same socket of a multi-socket machine for execution in parallel. Initial placements may be based on profile data that characterizes the machine and its behavior when multiple applications execute on the same socket. The profile data may be collected during execution of other applications. It may identify performance counters within the cores of the processor sockets whose values are suitable for use in predicting whether the performance of a pair of applications will suffer when they are executed together on the same socket (e.g., values indicative of their demands for particular shared resources). During execution, the scheduler may examine the performance counters (or performance metrics derived therefrom) and determine different placement decisions (e.g., placing an application with high demand for resources of one type together with an application with low demand for those resources).

First claim

Opening claim text (preview).

What is claimed: 1 . A method, comprising: performing, by a computer that includes multiple processor sockets, each of which includes one or more processor cores: receiving an application that is configured for parallel execution on the computer; determining, dependent on profile data that characterizes the behavior of the computer when multiple applications are executed in parallel on a single one of the processor sockets, that the application is to be executed on a given one of the multiple processor sockets while a particular other application is also executing on the given one of the multiple processor sockets; beginning execution of the given application, wherein execution of the given application comprises executing program instructions that perform work on behalf of the given application and that cause a respective value of each of one or more performance counters in one or more processor cores of the given one of the multiple processor sockets on which respective software threads of the given application are executing to be updated; and determining, prior to completing execution of the given application or the particular other application, and dependent on the updated values of the one or more performance counters, that execution of the given application or execution of the particular other application is to continue on a different one of the multiple processor sockets. 2 . The method of claim 1 , wherein the updated values of the performance counters are indicative of the extent to which the given application and the particular other application compete for a resource of a given type on the given processor socket that is shared by the given application and the particular other application. 3 . The method of claim 2 , wherein said determining that execution of the given application or execution of the particular other application is to continue on a different one of the multiple processor sockets comprises determining that demand for the shared resource by both the given application and the particular other application is high. 4 . The method of claim 3 , further comprising: selecting the different one of the multiple processor sockets, wherein selecting the different one of the multiple processor sockets comprises identifying a processor socket on which is executing an application for which demand for resources of the given type is low. 5 . The method of claim 2 , wherein the updated values of the performance counters are indicative of the demand by the given application for shared memory resources. 6 . The method claim 2 , wherein the updated values of the performance counters are indicative of a cache miss rate for the given application or a rate at which load instructions are attempted by the given application. 7 . The method of claim 1 , wherein the profile data that characterizes the behavior of the computer when multiple applications are executed in parallel on a single one of the multiple processor sockets of the computer indicates that contention for a shared resource of a given type by multiple applications executing in parallel on a single one of the multiple processor sockets negatively impacts the performance of the multiple applications. 8 . The method of claim 1 , wherein the profile data that characterizes the behavior of the computer when multiple applications are executed in parallel on a single one of the multiple processor sockets of the computer indicates that values of the one or more performance counters in the one or more processor cores of the given one of the multiple processor sockets on which respective software threads of the given application are executing are suitable for use in predicting whether or not the performance of a pair of applications executed in parallel on the same one of the multiple sockets will be significantly lower than the performance of the pair of applications when executed in parallel on respective different ones of the multiple sockets. 9 . The method of claim 1 , wherein the method further comprises, prior to said receiving, performing an operation to characterize the behavior of the computer when multiple applications are executed in parallel on a single one of the multiple processor sockets of the computer; wherein characterizing the behavior of the computer comprises one or more of: identifying the one or more performance counters whose values are suitable for use in predicting whether or not the performance of a pair of applications executed in parallel on the same one of the multiple sockets will be significantly lower than the performance of the pair of applications when executed in parallel on respective different ones of the multiple socket; or identifying the given type of the shared resources for which contention by multiple applications executing in parallel on a single one of the multiple processor sockets negatively impacts the performance of the multiple applications. 10 . The method of claim 1 , further comprising: performing, periodically, at pre-determined time intervals, an operation to determine whether or not execution of the given application and the particular other application should continue in parallel on the given one of the multiple processor sockets; wherein performing the operation comprises comparing the updated values of the one or more performance counters with values of one or more performance counters in one or more other processor cores of the given one of the multiple processor sockets on which respective software threads of the particular other application are executing. 11 . The method of claim 1 , further comprising: performing, in response to a context switch between software threads of the applications executing on the given one of the multiple processor sockets, an operation to determine whether or not execution of the given application and the particular other application should continue in parallel on the given one of the multiple processor sockets; wherein performing the operation comprises comparing the updated values of the one or more performance counters with values of one or more performance counters in one or more other processor cores of the given one of the multiple processor sockets on which respective software threads of the particular other application are executing. 12 . The method of claim 1 , wherein at least one of said determining that the application is to be executed on a given one of the multiple processor sockets or said determining that execution of the given application or execution of the particular other application is to continue on a different one of the multiple processor sockets is performed by a process or thread of an operating system executing on the computer or a resource-management-enabled parallel runtime system executing on the computer. 13 . The method of claim 1 , wherein the method further comprises: aggregating the updated values of the one or more performance counters to generate one or more performance metrics for the given application; aggregating values of one or more performance counters in one or more other processor cores of the given one of the multiple processor sockets on which respective software threads of the particular other application are executing to generate one or more performance metrics for the particular other application; aggregating values of one or more performance counters in one or more processor cores of the different one of the multiple processor sockets on which respective software threads of a third application are executing to generate one or more performance metrics for the third application; and wherein said determining that execution of th

Assignees

Inventors

Classifications

  • G06F9/5027Primary

    the resource being a machine, e.g. CPUs, Servers, Terminals · CPC title

  • Monitor · CPC title

  • Performance criteria · CPC title

  • Techniques for rebalancing the load in a distributed system · CPC title

  • involving deadlines, e.g. rate based, periodic · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016246647A1 cover?
An adaptive contention-aware thread scheduler may place software threads for pairs of application on the same socket of a multi-socket machine for execution in parallel. Initial placements may be based on profile data that characterizes the machine and its behavior when multiple applications execute on the same socket. The profile data may be collected during execution of other applications. It…
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/5027. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Aug 25 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).