Hardware and runtime coordinated load balancing for parallel applications

US2016259667A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016259667-A1
Application numberUS-201514641220-A
CountryUS
Kind codeA1
Filing dateMar 6, 2015
Priority dateMar 6, 2015
Publication dateSep 8, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of balancing execution rates for a plurality of parallel program loops being executed concurrently by a processor may include estimating a completion time for each program loop of the plurality of program loops, determining a difference between the estimated completion time of a first program loop of the plurality of program loops and the estimated completion time of a second program loop of the plurality of program loops, and decreasing the difference by adjusting an execution rate of the first program loop.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method, comprising: estimating a completion time for each program loop of a plurality of program loops executed concurrently by a processor; determining a difference between the estimated completion time of a first program loop of the plurality of program loops and the estimated completion time of a second program loop of the plurality of program loops; and decreasing the difference by adjusting an execution rate of the first program loop. 2 . The method of claim 1 , further comprising storing loop progress information for each of the plurality of program loops in one or more loop progress buffers, wherein the estimated completion time for each program loop of the plurality of program loops is determined based on the loop progress information. 3 . The method of claim 2 , wherein the plurality of program loops is defined in an executable program executed by the processor, and wherein the method further comprises: identifying a dependent instruction block in the executable program to be executed after completion of all of the program loops of the plurality of program loops; and in response to identifying the dependent instruction block and in response to a backwards branch in the program corresponding to the first program loop, storing a program counter value of the backwards branch in the one or more loop progress buffers. 4 . The method of claim 2 , wherein for each program loop of the plurality of program loops, the loop progress information comprises a number of instructions, a number of completed cycles, and a number of expected total cycles for the program loop. 5 . The method of claim 2 , wherein for each program loop of the plurality of program loops, the loop progress information comprises a power state of a processing core executing the program loop and an estimated power consumption for the program loop. 6 . The method of claim 2 , further comprising: for each program loop of the plurality of program loops, storing values in one or more loop termination buffers including a speculative iteration count, a non-speculative iteration count, a trip count, and a confidence bit, wherein estimating the completion time is based on the values stored in the one or more loop termination buffers. 7 . The method of claim 1 , wherein the processor comprises a plurality of processing cores, and wherein adjusting the execution rate of the first program loop comprises adjusting an amount of power supplied to a processing core executing the first program loop. 8 . The method of claim 1 , further comprising: comparing the difference with a sleep threshold; and transitioning a processing core assigned to execute the first program loop to a sleep state after completion of the first program loop if the difference is greater than the sleep threshold, wherein the first program loop has an earlier estimated completion time than the second program loop. 9 . The method of claim 8 , wherein the sleep threshold is longer than a duration for the first program loop to enter the sleep state plus a duration for the first program loop to exit the sleep state. 10 . The method of claim 8 , further comprising causing the processing core assigned to execute the first program loop to exit the sleep state prior to the estimated completion time of the second loop. 11 . An apparatus, comprising: one or more loop progress buffers configured to store loop progress information for each of a plurality of program loops executed concurrently by a set of processing cores; a runtime system coupled with one or more loop progress buffers and configured to: estimate a completion time for each of the plurality of program loops based on the loop progress information; determine a difference between the estimated completion time of a first program loop of the plurality of program loops and the estimated completion time of a second program loop of the plurality of program loops; and a power management unit coupled with the runtime system, wherein the power management unit is configured to decrease the difference by adjusting an execution rate of the first program loop. 12 . The apparatus of claim 11 , wherein the set of processing cores comprises multiple processing cores each configured to execute one of the plurality of program loops. 13 . The apparatus of claim 12 , wherein the power management unit is further configured to adjust the execution rate of the first program loop by redistributing power between two or more of the multiple processing cores. 14 . The apparatus of claim 12 , wherein the power management unit is further configured to cause at least one of the multiple cores to enter a sleep state based on the difference between the estimated completion time of the first program loop and the estimated completion time of the second program loop. 15 . The apparatus of claim 11 , wherein for each program loop of the plurality of program loops, the loop progress information comprises a number of instructions, a number of completed cycles, and a number of expected total cycles, a power state of a processing core executing the program loop, and an estimated power consumption for the program loop. 16 . The apparatus of claim 11 , further comprising one or more loop termination buffers coupled with the runtime system, wherein the one or more loop termination buffers are configured to, for each of the plurality of program loops, store loop termination information comprising a speculative iteration count, a non-speculative iteration count, a trip count, and a confidence bit, wherein the runtime system is further configured to estimate the completion time based on the loop termination information. 17 . A non-transitory computer-readable medium storing instructions that when executed by a processor, cause the processor to perform a method comprising: estimating a completion time for each program loop of a plurality of program loops executed concurrently by the processor; determining a difference between the estimated completion time of a first program loop of the plurality of program loops and the estimated completion time of a second program loop of the plurality of program loops; and decreasing the difference by adjusting an execution rate of the first program loop. 18 . The non-transitory computer-readable medium of claim 17 , wherein the method further comprises storing loop progress information for each of the plurality of program loops in one or more loop progress buffers, wherein the estimated completion time for each program loop of the plurality of program loops is determined based on the loop progress information. 19 . The non-transitory computer-readable medium of claim 17 , wherein the processor comprises a plurality of processing cores, and wherein adjusting the execution rate of the first program loop comprises adjusting an amount of power supplied to a processing core executing the first program loop. 20 . The non-transitory computer-readable medium of claim 17 , wherein the method further comprises: comparing the difference with a sleep threshold; and transitioning a processing core assigned to execute the first program loop to a sleep state after completion of the first program loop if the difference is greater than the sleep threshold, wherein the first program loop has an earlier estimated completion time than the second program loop.

Assignees

Inventors

Classifications

  • G06F1/329Primary

    by task scheduling · CPC title

  • where the allocation takes into account power or heat criteria (power management in computers in general G06F1/3203; thermal management in computers in general G06F1/206) · CPC title

  • G06F9/505Primary

    considering the load · CPC title

  • Power saving in microcontroller unit · CPC title

  • taking into account power or heat criteria (power management in computers in general G06F1/3203; thermal management in computers in general G06F1/206) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016259667A1 cover?
A method of balancing execution rates for a plurality of parallel program loops being executed concurrently by a processor may include estimating a completion time for each program loop of the plurality of program loops, determining a difference between the estimated completion time of a first program loop of the plurality of program loops and the estimated completion time of a second program l…
Who is the assignee on this patent?
Advanced Micro Devices Inc
What technology area does this patent fall under?
Primary CPC classification G06F1/329. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Sep 08 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).