Processor with hardware supported memory buffer overflow detection
US-11868774-B2 · Jan 9, 2024 · US
US10241793B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10241793-B2 |
| Application number | US-201414200788-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 7, 2014 |
| Priority date | Mar 15, 2013 |
| Publication date | Mar 26, 2019 |
| Grant date | Mar 26, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In one particular example, this disclosure provides an efficient mechanism to determine the degree of parallelization possible for a loop in the presence of possible memory aliases that cannot be resolved at compile-time. Hardware instructions are provided that test memory addresses at run-time and set a mode or register that enables a single instance of a loop to run the maximum number of SIMD (Single Instruction, Multiple Data) lanes to run in parallel that obey the semantics of the original scalar loop. Other hardware features that extend applicability or performance of such instructions are enumerated.
Opening claim text (preview).
What is claimed is: 1. A method to determine a degree of parallelization possible for a loop in a presence of possible memory aliases, comprising: providing hardware instructions that test memory addresses for pointer aliasing at run-time, at least in part, by determining a distance between the memory addresses, wherein the distance is measured in multiples of a data type; providing a scalar code that, when executed, performs a function; generating, from the scalar code, a vectorized code containing a same number of loops as the scalar code; and setting a mode or register that specifies a maximum number of Single Instruction, Multiple Data (SIMD) lanes on which to execute the vectorized code in parallel to perform the function of the scalar code, wherein the maximum number of SIMD lanes is the lesser of the distance and a number of SIMD lanes that are enabled. 2. The method of claim 1 , further comprising enabling the maximum number of SIMD lanes to obtain a maximum speed-up subject to obeying loop-carried dependencies of the scalar code. 3. The method of claim 1 , further comprising, determining, by a processor, that the maximum number of SIMD lanes is to perform the function of the scalar code. 4. The method of claim 1 , further comprising providing a hardware mechanism to specify the number of SIMD lanes that are enabled, wherein the hardware mechanism is one selected from the group consisting of: a mode register that contains the number of SIMD lanes that are enabled and a mask register that controls which SIMD lanes are enabled. 5. The method of claim 4 , further comprising setting the hardware mechanism to enable the maximum number of SIMD lanes dependent on a test of memory addresses at run-time. 6. The method of claim 4 , further comprising altering a number of iterations of the vectorized code dependent on the number of SIMD lanes that are enabled. 7. The method of claim 1 , wherein a SIMD Staged loop allows the vectorized code to run over vector lengths that are not multiples of a vectorization factor. 8. The method of claim 1 , wherein increments of induction variables are scaled by a number of vector lanes that are enabled in the vectorized code. 9. The method of claim 1 , wherein the method includes performing reductions, which further include summation, bitwise operations, minimum or maximum, across vector lanes currently enabled according to a mode register. 10. The method of claim 1 , wherein the method includes copying a value from a last vector lane used in a last iteration of the vectorized code to memory or to a known register location. 11. An apparatus for determining a degree of parallelization possible for a loop in a presence of possible memory aliases, the apparatus comprising: at least one processing element for executing hardware instructions that test memory addresses for pointer aliasing at run-time, at least in part, by determining a distance between the memory addresses, wherein the distance is measured in multiples of a data type; a scalar code that, when executed, performs a function; a vectorized code containing a same number of loops as the scalar code; and a mode or register that is set to specify a maximum number of Single Instruction, Multiple Data (SIMD) lanes on which to execute the vectorized code in parallel to perform the function of the scalar code, wherein the maximum number of SIMD lanes is the lesser of the distance and a number of SIMD lanes that are enabled. 12. The apparatus of claim 11 , wherein the mode or register specifies how many SIMD lanes are enabled, and the mode or register is selected from the group consisting of a mode register that contains the number of SIMD lanes that are enabled and a mask register that controls which SIMD lanes are enabled. 13. The apparatus of claim 12 , wherein the at least one processing element is operable to set the mode or register to enable the maximum number of SIMD lanes dependent on a test of memory addresses at run-time. 14. The apparatus of claim 12 , wherein the at least one processing element is operable to alter a number of iterations of the vectorized code dependent on the number of SIMD lanes that are enabled. 15. The apparatus of claim 11 , wherein the at least one processing element performs reductions, which further include summation, bitwise operations, minimum or maximum, across vector lanes currently enabled according to a mode register. 16. The apparatus of claim 11 , wherein the at least one processing element copies a value from a last vector lane used in a last iteration of the vectorized code to memory or to a known register location. 17. At least one machine readable non-transitory storage medium having instructions stored thereon for determining a degree of parallelization possible for a loop in a presence of possible memory aliases, wherein the instructions, when executed by at least one processor, causes the at least one processor to perform the following operations: execute hardware instructions that test memory addresses for pointer aliasing at run-time, at least in part, by determining a distance between the memory addresses, wherein the distance is measured in multiples of a data type; access a scalar code that, when executed, performs a function; generate, from the scalar code, a vectorized code containing a same number of loops as the scalar code; and set a mode or register that specifies a maximum number of Single Instruction, Multiple Data (SIMD) lanes on which to execute the vectorized code in parallel to perform the function of the scalar code, wherein the maximum number of SIMD lanes is the lesser of the distance and a number of SIMD lanes that are enabled. 18. The method of claim 1 , wherein the hardware instructions that test the memory addresses for pointer aliasing at run-time comprise instructors for: determining, at run-time, a difference between the memory addresses; and dividing the difference by a memory size of the data type to determine the distance, wherein each of the memory addresses holds a value of the data type. 19. The apparatus of claim 11 , wherein the hardware instructions that test the memory addresses for pointer aliasing at run-time comprise instructors for: determining, at run-time, a difference between the memory addresses; and dividing the difference by a memory size of the data type to determine the distance, wherein each of the memory addresses holds a value of the data type. 20. The at least one machine readable non-transitory storage medium of claim 17 , wherein the hardware instructions that test the memory addresses for pointer aliasing at run-time comprise instructors for: determining, at run-time, a difference between the memory addresses; and dividing the difference by a memory size of the data type to determine the distance, wherein each of the memory addresses holds a value of the data type.
Instruction analysis, e.g. decoding, instruction word fields · CPC title
Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution · CPC title
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
Software pipelining · CPC title
Iterative single instructions for multiple data lanes [SIMD] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.