Scalar optimizations for shaders
US-9430199-B2 · Aug 30, 2016 · US
US9740493B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9740493-B2 |
| Application number | US-201213994549-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 28, 2012 |
| Priority date | Sep 28, 2012 |
| Publication date | Aug 22, 2017 |
| Grant date | Aug 22, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Loop vectorization methods and apparatus are disclosed. An example method includes generating a first control mask for a set of iterations of a loop by evaluating a condition of the loop, wherein generating the first control mask includes setting a bit of the control mask to a first value when the condition indicates that an operation of the loop is to be executed, and setting the bit of the first control mask to a second value when the condition indicates that the operation of the loop is to be bypassed. The example method also includes compressing indexes corresponding to the first set of iterations of the loop according to the first control mask.
Opening claim text (preview).
The invention claimed is: 1. A method comprising: generating a first control mask for a first set of iterations of a loop by evaluating a condition of the loop, wherein generating the first control mask includes: setting a bit of the first control mask to a first value when the condition indicates that an operation of the loop is to be executed; and setting the bit of the first control mask to a second value when the condition indicates that the operation of the loop is to be bypassed; compressing, by executing an instruction with a processor, indexes corresponding to the first set of iterations of the loop according to the first control mask by: maintaining a first one of the indexes when a first bit of the first control mask associated with the first one of the indexes has the first value; and discarding, before the operation is executed, a second one of the indexes when a second bit of the first control mask associated with the second one of the indexes has the second value; populating an array with the compressed indexes, wherein the operation of the loop is to be performed on data elements corresponding to the compressed indexes in the array; and shifting indexes at higher order positions of the array to lower order positions of the array when the operation of the loop has been performed on data elements corresponding to the indexes at lower order positions of the array. 2. The method of claim 1 , further including compressing data elements corresponding to the indexes, wherein the indexes indicate at which memory locations results of the operation are to be stored for respective ones of the data elements. 3. The method of claim 2 , further including: loading the data elements into a first register; and loading the indexes corresponding to the first set of iterations of the loop into a second register. 4. The method of claim 1 , further including when a number of indexes that have been entered into the array meets a threshold, loading data elements corresponding to an amount of the indexes in the array into a register on which the operation is to be performed. 5. The method of claim 4 , further including, when the number of indexes that have been entered into the array does not meet the threshold, evaluating the condition for a second set of iterations of the loop without performing the operation on the data elements corresponding to the amount of the indexes in the array. 6. At least one tangible machine readable storage medium comprising instructions that, when executed, cause a machine to at least: generate a first control mask for a first set of iterations of a loop by evaluating a condition of the loop, wherein generating the first control mask includes: setting a bit of the first control mask to a first value when the condition indicates that an operation of the loop is to be executed; and setting the bit of the first control mask to a second value when the condition indicates that the operation of the loop is to be bypassed; compress indexes corresponding to the first set of iterations of the loop according to the first control mask by: maintaining a first one of the indexes when a first bit of the first control mask associated with the first one of the indexes has the first value; and discarding, before the operation is executed, a second one of the indexes when a second bit of the first control mask associated with the second one of the indexes has the second value; populate an array with the compressed indexes, wherein the operation of the loop is to be performed on data elements corresponding to the compressed indexes in the array; and shift indexes at higher order positions of the array to lower order positions of the array when the operation of the loop has been performed on data elements corresponding to the indexes at lower order positions of the array. 7. At least one tangible machine readable storage medium as defined in claim 6 , wherein the instructions, when executed, cause the machine to: load data elements into a first register; and load the indexes corresponding to the first set of iterations of the loop into a second register. 8. At least one tangible machine readable storage medium as defined in claim 6 , wherein the instructions, when executed, cause the machine to compress data elements corresponding to the indexes, wherein the indexes indicate at which memory locations results of the operation are to be stored for respective ones of the data elements. 9. At least one tangible machine readable storage medium as defined in claim 6 , wherein the instructions, when executed, cause the machine to, when a number of indexes that have been entered into the array meets a threshold, load data elements corresponding to an amount of the indexes in the array into a register on which the operation is to be performed. 10. At least one tangible machine readable storage medium as defined in claim 9 , wherein the instructions, when executed, cause the machine to, when the number of indexes that have been entered into the array does not meet the threshold, evaluate the condition for a second set of iterations of the loop without performing the operation on the data elements corresponding to the amount of the indexes in the array. 11. An apparatus comprising: a control mask generator to generate a first control mask for a first set of iterations of a loop by evaluating a condition of the loop, the control mask generator to generate the first control mask by: setting a bit of the first control mask to a first value when the condition indicates that an operation of the loop is to be executed; and setting the bit of the first control mask to a second value when the condition indicates that the operation of the loop is to be bypassed; an index loader to load indexes corresponding to the first set of iterations of the loop into a first register; a data compressor to compress the indexes in the first register according to the first control mask by: maintaining a first one of the indexes when a first bit of the first control mask associated with the first one of the indexes has the first value; and discarding, before the operation is executed, a second one of the indexes when a second bit of the first control mask associated with the second one of the indexes has the second value; an array populater to populate an array with the compressed indexes; an array evaluator to determine whether a number of indexes that have been entered into the array meets a threshold; and a register loader to load data elements corresponding to an amount of the indexes in the array into a register on which the operation is to be performed, by computation performer circuitry, when the number of indexes meets the threshold, at least one of the control mask generator, the index loader, the data compressor, the array populater, and the array evaluator is implemented by hardware. 12. The apparatus of claim 11 , further including a data loader to load data elements corresponding to the indexes into a second register, wherein the data compressor is to compress the data elements in the second register according to the first control mask, and the indexes indicate memory locations at which results of the operation are to be stored for respective ones of the data elements. 13. The apparatus as defined in claim 11 , wherein the control mask generator is to evaluate the condition for a second set of iterations of the loop without performing the operation on the data elements corresponding to the amount of the indexes in the array when the number of indexes that have been entered into the array does not meet the threshold. 14. Th
Loops · CPC title
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
Special arrangements thereof, e.g. mask or switch · CPC title
Reducing the execution time required by the program code · CPC title
Concurrent instruction execution, e.g. pipeline or look ahead · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.