Approximate computation in digital systems using bit partitioning
US-11914447-B1 · Feb 27, 2024 · US
US2018173291A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2018173291-A1 |
| Application number | US-201615385184-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 20, 2016 |
| Priority date | Dec 20, 2016 |
| Publication date | Jun 21, 2018 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments described herein relate to improving processor power-performance using a binary analyzer routine. In one example, a processor includes a memory interface to couple to a memory, at least one hardware accelerator circuit, and an execution pipeline including at least fetch, decode, and execute stages, wherein the processor, in response to a hot-spot hardware event indicating presence of a hot-spot sequence, is to switch context to a binary analyzer routine stored in the memory, the binary analyzer routine including instructions that, when fetched, decoded, and executed by the processor, cause the processor to analyze a region in the memory containing the hot-spot sequence, analyze hardware metrics relating to execution of the hot-spot sequence, and generate, based on the analyses, a recommendation for the at least one hardware accelerator circuit to improve at least one of power consumption and performance.
Opening claim text (preview).
1 . A processor comprising: a memory interface to couple to a memory; at least one hardware accelerator circuit; and an execution pipeline comprising at least fetch, decode, and execute stages; wherein the processor, in response to a hot-spot hardware event indicating presence of a hot-spot sequence, is to switch context to a binary analyzer routine stored in the memory, the binary analyzer routine comprising instructions that, when fetched, decoded, and executed by the processor, cause the processor to analyze a region in the memory containing the hot-spot sequence; analyze hardware metrics relating to execution of the hot-spot sequence; and generate, based on the analyses, a recommendation for the at least one hardware accelerator circuit to improve at least one of power consumption and performance. 2 . The processor of claim 1 , further comprising a hot-spot detector circuit to monitor the execution pipeline, detect the hot-spot sequence, gather the hardware metrics relating to execution of the hot-spot sequence, and generate the hot-spot hardware event. 3 . The processor of claim 1 , wherein the hot-spot sequence comprises at least one of a branch instruction, a loop instruction, a memory access to at least one of a loop index, a loop constant, and a loop invariant, and an instruction that has repeated at least a threshold number of times. 4 . The processor of claim 1 , wherein the processor, when executing the binary analyzer routine, uses a memory protection mechanism to define protected memory regions in which to store a code segment, a data segment, and a stack segment of the binary analyzer routine. 5 . The processor of claim 4 , wherein the processor is further to store the recommendation in the data segment for future use, and wherein the recommendation is generated once and used to generate recommendations for future occurrences of the hot-spot sequence. 6 . The processor of claim 4 , wherein when the processor, during execution of the binary analyzer routine, determines the hot-spot sequence receives an invariant value in response to a plurality of memory read requests, the processor is to generate a recommendation that a register/memory read stage of the execution pipeline convert the plurality of memory read requests into register read requests, and to store the invariant value in a register. 7 . The processor of claim 4 , wherein when the processor, during execution of the binary analyzer routine, determines that an instruction source operand value is predictable, the processor is further to generate a recommendation that a register/memory read stage of the execution pipeline use a predicted value for the instruction source operand. 8 . The processor of claim 4 , wherein the processor, during execution of the binary analyzer routine, is to generate a recommendation to a schedule stage of the execution pipeline to conduct a speculative execution of the hot-spot sequence, and to prepare to roll back the speculative execution. 9 . The processor of claim 8 , wherein the processor, during execution of the binary analyzer routine, is to generate a recommendation that the schedule stage begin speculative execution at a first linear instruction address, and to stop speculative execution at a second linear instruction access. 10 . The processor of claim 4 , wherein when the processor, during execution of the binary analyzer routine, identifies underused registers, the processor is further to generate a recommendation to a register allocate stage of the execution pipeline to reallocate the underused registers. 11 . The processor of claim 4 , wherein when the processor, during execution of the binary analyzer routine, determines that the hot-spot sequence is to utilize less than a threshold amount of power, the processor Is further to generate a recommendation to a power control circuit of the processor to enter into a lower-power power state. 12 . A system comprising: a memory interface to couple to a memory; at least one hardware accelerator circuit; and a processing core comprising an execution pipeline comprising at least fetch, decode, and execute stages; wherein the processing core, in response to a hot-spot hardware event indicating presence of a hot-spot sequence, is to switch context to a binary analyzer routine stored in the memory, the binary analyzer routine comprising instructions that, when fetched, decoded, and executed by the processing core, cause the processing core to analyze a region in the memory containing the hot-spot sequence; analyze hardware metrics relating to execution of the hot-spot sequence; and generate, based on the analyses, a recommendation for the at least one hardware accelerator circuit to improve at least one of power consumption and performance. 13 . The system of claim 12 , further comprising a hot-spot detector to monitor the execution pipeline, to detect the hot-spot sequence, to gather the hardware metrics, and to generate the hot-spot hardware event. 14 . The system of claim 12 , wherein when the processing core, during execution of the binary analyzer routine, determines the hot-spot sequence receives an invariant value in response to a plurality of memory read requests, the processing core is further to generate a recommendation that a register/memory read stage of the execution pipeline convert the plurality of memory read requests into register read requests, and to store the invariant value in a register. 15 . The system of claim 12 , wherein when the processing core, during execution of the binary analyzer routine, determines that an instruction source operand value is predictable, the processing core is further to generate a recommendation that a register/memory read stage of the execution pipeline use a predicted value for the instruction source operand. 16 . The system of claim 12 , wherein the processing core, during execution of the binary analyzer routine, is to generate a recommendation to a schedule stage of the execution pipeline to conduct a speculative execution of the hot-spot sequence, and to prepare to roll back the speculative execution. 17 . The system of claim 12 , wherein when the processing core, during execution of the binary analyzer routine, identifies underused registers, the processing core is further to generate a recommendation to a register allocate stage of the execution pipeline to reallocate the underused registers. 18 . The system of claim 12 , wherein when the processing core, during execution of the binary analyzer routine, determines that the hot-spot sequence is to utilize less than a threshold amount of power, the processing core is further to generate a recommendation to a power control circuit to enter into a lower-power power state. 19 . A non-transitory computer-readable storage medium having stored therein instructions, which, when executed by a processor comprising a memory interface to couple to a memory, at least one hardware accelerator circuit, and an execution pipeline comprising at least fetch, decode, and execute stages, cause the processor to: fetch, decode, and execute instructions; and switch context, in response to a hot-spot hardware event indicating presence of a hot-spot sequence, to a binary analyzer routine stored in the memory, the binary analyzer routine comprising instructions that, when fetched, decoded, and executed by the processor, cause the processor to analyze a region in the memory containing the hot-spot sequence; analyze hardware metrics relating to execution of the hot-spot sequence; and generat
Loop control instructions; iterative instructions, e.g. LOOP, REPEAT · CPC title
using instruction pipelines · CPC title
by task scheduling · CPC title
Monitoring task completion, e.g. by use of idle timers, stop commands or wait commands · CPC title
Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.