Apparatus and method for improving power-performance using a software analysis routine

US2018173291A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2018173291-A1
Application numberUS-201615385184-A
CountryUS
Kind codeA1
Filing dateDec 20, 2016
Priority dateDec 20, 2016
Publication dateJun 21, 2018
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments described herein relate to improving processor power-performance using a binary analyzer routine. In one example, a processor includes a memory interface to couple to a memory, at least one hardware accelerator circuit, and an execution pipeline including at least fetch, decode, and execute stages, wherein the processor, in response to a hot-spot hardware event indicating presence of a hot-spot sequence, is to switch context to a binary analyzer routine stored in the memory, the binary analyzer routine including instructions that, when fetched, decoded, and executed by the processor, cause the processor to analyze a region in the memory containing the hot-spot sequence, analyze hardware metrics relating to execution of the hot-spot sequence, and generate, based on the analyses, a recommendation for the at least one hardware accelerator circuit to improve at least one of power consumption and performance.

First claim

Opening claim text (preview).

1 . A processor comprising: a memory interface to couple to a memory; at least one hardware accelerator circuit; and an execution pipeline comprising at least fetch, decode, and execute stages; wherein the processor, in response to a hot-spot hardware event indicating presence of a hot-spot sequence, is to switch context to a binary analyzer routine stored in the memory, the binary analyzer routine comprising instructions that, when fetched, decoded, and executed by the processor, cause the processor to analyze a region in the memory containing the hot-spot sequence; analyze hardware metrics relating to execution of the hot-spot sequence; and generate, based on the analyses, a recommendation for the at least one hardware accelerator circuit to improve at least one of power consumption and performance. 2 . The processor of claim 1 , further comprising a hot-spot detector circuit to monitor the execution pipeline, detect the hot-spot sequence, gather the hardware metrics relating to execution of the hot-spot sequence, and generate the hot-spot hardware event. 3 . The processor of claim 1 , wherein the hot-spot sequence comprises at least one of a branch instruction, a loop instruction, a memory access to at least one of a loop index, a loop constant, and a loop invariant, and an instruction that has repeated at least a threshold number of times. 4 . The processor of claim 1 , wherein the processor, when executing the binary analyzer routine, uses a memory protection mechanism to define protected memory regions in which to store a code segment, a data segment, and a stack segment of the binary analyzer routine. 5 . The processor of claim 4 , wherein the processor is further to store the recommendation in the data segment for future use, and wherein the recommendation is generated once and used to generate recommendations for future occurrences of the hot-spot sequence. 6 . The processor of claim 4 , wherein when the processor, during execution of the binary analyzer routine, determines the hot-spot sequence receives an invariant value in response to a plurality of memory read requests, the processor is to generate a recommendation that a register/memory read stage of the execution pipeline convert the plurality of memory read requests into register read requests, and to store the invariant value in a register. 7 . The processor of claim 4 , wherein when the processor, during execution of the binary analyzer routine, determines that an instruction source operand value is predictable, the processor is further to generate a recommendation that a register/memory read stage of the execution pipeline use a predicted value for the instruction source operand. 8 . The processor of claim 4 , wherein the processor, during execution of the binary analyzer routine, is to generate a recommendation to a schedule stage of the execution pipeline to conduct a speculative execution of the hot-spot sequence, and to prepare to roll back the speculative execution. 9 . The processor of claim 8 , wherein the processor, during execution of the binary analyzer routine, is to generate a recommendation that the schedule stage begin speculative execution at a first linear instruction address, and to stop speculative execution at a second linear instruction access. 10 . The processor of claim 4 , wherein when the processor, during execution of the binary analyzer routine, identifies underused registers, the processor is further to generate a recommendation to a register allocate stage of the execution pipeline to reallocate the underused registers. 11 . The processor of claim 4 , wherein when the processor, during execution of the binary analyzer routine, determines that the hot-spot sequence is to utilize less than a threshold amount of power, the processor Is further to generate a recommendation to a power control circuit of the processor to enter into a lower-power power state. 12 . A system comprising: a memory interface to couple to a memory; at least one hardware accelerator circuit; and a processing core comprising an execution pipeline comprising at least fetch, decode, and execute stages; wherein the processing core, in response to a hot-spot hardware event indicating presence of a hot-spot sequence, is to switch context to a binary analyzer routine stored in the memory, the binary analyzer routine comprising instructions that, when fetched, decoded, and executed by the processing core, cause the processing core to analyze a region in the memory containing the hot-spot sequence; analyze hardware metrics relating to execution of the hot-spot sequence; and generate, based on the analyses, a recommendation for the at least one hardware accelerator circuit to improve at least one of power consumption and performance. 13 . The system of claim 12 , further comprising a hot-spot detector to monitor the execution pipeline, to detect the hot-spot sequence, to gather the hardware metrics, and to generate the hot-spot hardware event. 14 . The system of claim 12 , wherein when the processing core, during execution of the binary analyzer routine, determines the hot-spot sequence receives an invariant value in response to a plurality of memory read requests, the processing core is further to generate a recommendation that a register/memory read stage of the execution pipeline convert the plurality of memory read requests into register read requests, and to store the invariant value in a register. 15 . The system of claim 12 , wherein when the processing core, during execution of the binary analyzer routine, determines that an instruction source operand value is predictable, the processing core is further to generate a recommendation that a register/memory read stage of the execution pipeline use a predicted value for the instruction source operand. 16 . The system of claim 12 , wherein the processing core, during execution of the binary analyzer routine, is to generate a recommendation to a schedule stage of the execution pipeline to conduct a speculative execution of the hot-spot sequence, and to prepare to roll back the speculative execution. 17 . The system of claim 12 , wherein when the processing core, during execution of the binary analyzer routine, identifies underused registers, the processing core is further to generate a recommendation to a register allocate stage of the execution pipeline to reallocate the underused registers. 18 . The system of claim 12 , wherein when the processing core, during execution of the binary analyzer routine, determines that the hot-spot sequence is to utilize less than a threshold amount of power, the processing core is further to generate a recommendation to a power control circuit to enter into a lower-power power state. 19 . A non-transitory computer-readable storage medium having stored therein instructions, which, when executed by a processor comprising a memory interface to couple to a memory, at least one hardware accelerator circuit, and an execution pipeline comprising at least fetch, decode, and execute stages, cause the processor to: fetch, decode, and execute instructions; and switch context, in response to a hot-spot hardware event indicating presence of a hot-spot sequence, to a binary analyzer routine stored in the memory, the binary analyzer routine comprising instructions that, when fetched, decoded, and executed by the processor, cause the processor to analyze a region in the memory containing the hot-spot sequence; analyze hardware metrics relating to execution of the hot-spot sequence; and generat

Assignees

Inventors

Classifications

  • Loop control instructions; iterative instructions, e.g. LOOP, REPEAT · CPC title

  • using instruction pipelines · CPC title

  • G06F1/329Primary

    by task scheduling · CPC title

  • Monitoring task completion, e.g. by use of idle timers, stop commands or wait commands · CPC title

  • Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018173291A1 cover?
Embodiments described herein relate to improving processor power-performance using a binary analyzer routine. In one example, a processor includes a memory interface to couple to a memory, at least one hardware accelerator circuit, and an execution pipeline including at least fetch, decode, and execute stages, wherein the processor, in response to a hot-spot hardware event indicating presence o…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F1/329. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 21 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).