Hardware profiling mechanism to enable page level automatic binary translation

US9542191B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9542191-B2
Application numberUS-201213993792-A
CountryUS
Kind codeB2
Filing dateMar 30, 2012
Priority dateMar 30, 2012
Publication dateJan 10, 2017
Grant dateJan 10, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A hardware profiling mechanism implemented by performance monitoring hardware enables page level automatic binary translation. The hardware during runtime identifies a code page in memory containing potentially optimizable instructions. The hardware requests allocation of a new page in memory associated with the code page, where the new page contains a collection of counters and each of the counters corresponds to one of the instructions in the code page. When the hardware detects a branch instruction having a branch target within the code page, it increments one of the counters that has the same position in the new page as the branch target in the code page. The execution of the code page is repeated and the counters are incremented when branch targets fall within the code page. The hardware then provides the counter values in the new page to a binary translator for binary translation.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: one or more processor cores, each of the one or more of the processor cores including performance monitoring hardware; and cache units coupled to the one or more processor cores, wherein the performance monitoring hardware is configured to: identify a code page in memory containing potentially optimizable instructions; request allocation of a new page in the memory, wherein the new page is associated with the code page, and wherein the new page contains a collection of counters and the counters correspond to instructions in the code page; detect a branch instruction having a branch target within the code page; increment one of the counters that has a same position in the new page as the branch target in the code page; repeat execution of code in the code page and incrementing the counters when branch targets fall within the code page; and provide values of the counters in the new page to a binary translator for binary translation. 2. The apparatus of claim 1 , wherein the new page is used by the binary translator to hold code translated from the code page, thereby replacing the values of the counters. 3. The apparatus of claim 2 , wherein the code translated from the code page is sharable among different threads. 4. The apparatus of claim 1 , wherein the performance monitoring hardware is further configured to: after identifying the code page, pass a physical address identifying the code page to the binary translator to thereby allow the binary translator to determine whether the code page has been translated before; and in response to a determination that the code page has been translated before, obtain a physical address of a translated code page and execute code in the translated code page without requesting the new page to be allocated. 5. The apparatus of claim 1 , wherein a size of each counter is not larger than the granularity of instructions in the code page. 6. The apparatus of claim 1 , wherein each of the counters saturates at a maximum value and does not roll over back to zero. 7. The apparatus of claim 1 , wherein the code page is translated into position independent code. 8. A method comprising: identifying, by performance monitoring hardware during runtime, a code page in memory containing potentially optimizable instructions; requesting allocation of a new page in the memory, wherein the new page is associated with the code page, and wherein the new page contains a collection of counters and the counters correspond instructions in the code page; detecting a branch instruction having a branch target within the code page; incrementing one of the counters that has a same position in the new page as the branch target in the code page; repeating execution of code in the code page and incrementing the counters when branch targets fall within the code page; and providing values of the counters in the new page to a binary translator for binary translation. 9. The method of claim 8 , wherein the new page is used by the binary translator to hold code translated from the code page, thereby replacing the values of the counters. 10. The method of claim 9 , wherein the code translated from the code page is sharable among different threads. 11. The method of claim 8 , further comprising: after identifying the code page, passing a physical address identifying the code page to the binary translator to thereby allow the binary translator to determine whether the code page has been translated before; and in response to a determination that the code page has been translated before, obtaining a physical address of a translated code page and executing the translated code page without requesting the new page to be allocated. 12. The method of claim 8 , wherein a size of each counter is not larger than the granularity of instructions in the code page. 13. The method of claim 8 , wherein each of the counters saturates at a maximum value and does not roll over back to zero. 14. The method of claim 8 , wherein the code page is translated into position independent code. 15. A system comprising: memory to store a plurality of code pages; a processor coupled to the memory, the processor including performance monitoring hardware configured to: identify, during runtime, one of the code pages containing potentially optimizable instructions; request allocation of a new page in the memory, wherein the new page is associated with the identified code page, and wherein the new page contains a collection of counters and the counters correspond to instructions in the identified code page; detect a branch instruction having a branch target within the identified code page; increment one of the counters that has a same position in the new page as the branch target in the identified code page; repeat execution of code in the identified code page and incrementing the counters when branch targets fall within the identified code page; and provide values of the counters in the new page to a binary translator for binary translation. 16. The system of claim 15 , wherein the new page is used by the binary translator to hold code translated from the code page, thereby replacing the values of the counters. 17. The system of claim 16 , wherein the code translated from the code page is sharable among different threads. 18. The system of claim 15 , wherein the performance monitoring hardware is further configured to: after identifying the one of the code pages, pass a physical address identifying the code page to the binary translator to thereby allow the binary translator to determine whether the identified code page has been translated before; and in response to a determination that the code page has been translated before, obtain a physical address of a translated code page and execute the code in the translated code page without requesting the new page to be allocated. 19. The system of claim 15 , wherein each of the counters saturates at a maximum value and does not roll over back to zero. 20. The system of claim 15 , wherein the identified code page is translated into position independent code.

Assignees

Inventors

Classifications

  • in a memory management context, e.g. virtual memory or cache management (memory management G06F12/00; testing of static memory units G11C29/00) · CPC title

  • G06F8/52Primary

    Binary to binary · CPC title

  • Involving translation to a different instruction set architecture, e.g. just-in-time translation in a JVM · CPC title

  • using software metrics · CPC title

  • G06F9/3842Primary

    Speculative instruction execution · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9542191B2 cover?
A hardware profiling mechanism implemented by performance monitoring hardware enables page level automatic binary translation. The hardware during runtime identifies a code page in memory containing potentially optimizable instructions. The hardware requests allocation of a new page in memory associated with the code page, where the new page contains a collection of counters and each of the cou…
Who is the assignee on this patent?
Caprioli Paul, Merten Matthew C, Al-Otoom Muawya M, and 7 more
What technology area does this patent fall under?
Primary CPC classification G06F8/52. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 10 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).