Method and apparatus for compiling optimization using activation recalculation
US-2024303054-A1 · Sep 12, 2024 · US
US12020033B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12020033-B2 |
| Application number | US-202017133899-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 24, 2020 |
| Priority date | Dec 24, 2020 |
| Publication date | Jun 25, 2024 |
| Grant date | Jun 25, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Apparatus and method for memorizing repeat function calls are described herein. An apparatus embodiment includes: uop buffer circuitry to identify a function for memorization based on retiring micro-operations (uops) from a processing pipeline; memorization retirement circuitry to generate a signature of the function which includes input and output data of the function; a memorization data structure to store the signature; and predictor circuitry to detect an instance of the function to be executed by the processing pipeline and to responsively exclude a first subset of uops associated with the instance from execution when a confidence level associated with the function is above a threshold. One or more instructions that are data-dependent on execution of the instance is then provided with the output data of the function from the memorization data structure.
Opening claim text (preview).
The invention claimed is: 1. An apparatus comprising: a micro-operation (uop) buffer circuitry to identify a function for memorization based on retiring uops from a processing pipeline, the function associated with a function block of a plurality of uops; memorization retirement buffer circuitry to generate a signature of the function, the signature comprising input and output data of the function and an ordered sequence of the plurality of uops in the function block; a memorization data structure to store in an entry the signature associated with the function; and predictor circuitry to detect an instance of the function to be executed by the processing pipeline and responsively exclude a first subset of uops associated with the instance from execution when a confidence level associated with the function is above a threshold, wherein one or more instructions that are data-dependent on execution of the instance is provided with the output data of the function from the memorization data structure. 2. The apparatus of claim 1 , wherein the uop buffer circuitry is coupled to a re-order buffer (ROB) of the processing pipeline to store the retiring uops. 3. The apparatus of claim 2 , wherein the ROB is associated with a retirement width and the uop buffer circuitry comprises a storage structure that is sized at twice the retirement width of the ROB. 4. The apparatus of claim 3 , wherein the uop buffer circuitry is to track occurrences of a call uop in the storage structure to identify the function for memorization. 5. The apparatus of claim 1 , wherein the function is excluded from memorization when the function contains a system call or a floating-point calculation. 6. The apparatus of claim 1 , wherein a second subset of the uops of the instance remains in the processing pipeline for execution. 7. The apparatus of claim 6 , wherein the second subset comprises store uops and/or global load uops. 8. The apparatus of claim 1 , wherein the predictor circuitry is further to insert a dummy uop in the processing pipeline in place of the first subset of the plurality of uops excluded from the processing pipeline, the dummy uop usable to collect input and output values associated with the instance, the input and output values usable for validating memorization of the instance. 9. The apparatus of claim 8 , further comprising a validation circuitry to validate the input and output values collected by the dummy uop against the input and output data of the function stored in the memorization data structure. 10. The apparatus of claim 9 , wherein the dummy uop comprises a memorization data structure identifier for locating the entry in the memorization data structure. 11. The apparatus of claim 9 , wherein responsive to a positive validation by the validation circuitry, the dummy uop is removed from the processing pipeline and a confidence level of the instance tracked by the predictor circuitry is incremented. 12. The apparatus of claim 9 , wherein responsive to a negative validation by the validation circuitry, the instance of the function is re-inserted into the processing pipeline for execution. 13. The apparatus of claim 1 , wherein the input and output data of the function comprise registers and/or memory locations accessed by uops in the function block associated with the function. 14. A method comprising: identifying a function for memorization based on retiring micro-operations (uops) from a processing pipeline, the function associated with a function block of a plurality of uops; generating a signature of the function, the signature comprising input and output data of the function and an ordered sequence of the plurality of uops in the function block; storing in an entry the signature associated with the function in a memorization data structure; and detecting an instance of the function to be executed by the processing pipeline and responsively excluding a first subset of uops associated with the instance from execution when a confidence level associated with the function is above a threshold; and providing one or more instructions that are data-dependent on execution of the instance with the output data of the function from the memorization data structure. 15. The method of claim 14 , further comprises storing the retiring uops in a uop buffer which is coupled to a re-order buffer (ROB) of the processing pipeline. 16. The method of claim 15 , wherein the ROB is associated with a retirement width and the method further comprises sizing the uop buffer at twice the retirement width of the ROB. 17. The method of claim 15 , further comprises tracking occurrences of a call uop in the uop buffer to identify the function for memorization. 18. The method of claim 14 , further comprises excluding the function from memorization when the function contains a system call or a floating-point calculation. 19. The method of claim 14 , further comprises leaving a second subset of the uops of the instance in the processing pipeline for execution. 20. The method of claim 19 , wherein the second subset comprises store uops and/or global load uops. 21. The method of claim 14 , further comprises inserting a dummy uop in the processing pipeline in place of the first subset of the plurality of uops excluded from the processing pipeline, the dummy uop usable to collect input and output values associated with the instance, the input and output values usable for validating memorization of the instance. 22. The method of claim 21 , further comprises validating the input and output values collected by the dummy uop against the input and output data of the function stored in the memorization data structure. 23. The method of claim 22 , further comprises using a memorization data structure identifier in the dummy uop to locate the entry in the memorization data structure. 24. The method of claim 22 , further comprises removing the dummy uop from the processing pipeline and incrementing a confidence level of the instance responsive to a positive validation. 25. The method of claim 22 , further comprises re-inserting the instance of the function into the processing pipeline for execution responsive to a negative validation.
Execution means for microinstructions irrespective of the microinstruction function, e.g. decoding of microinstructions and nanoinstructions; timing of microinstructions; programmable logic arrays; delays and fan-out problems · CPC title
Dependency mechanisms, e.g. register scoreboarding · CPC title
using dynamic branch prediction, e.g. using branch history tables · CPC title
Recovery, e.g. branch miss-prediction, exception handling (error detection or correction G06F11/00) · CPC title
using instruction pipelines · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.