Adaptive shading in a graphics processing pipeline
US-2015170408-A1 · Jun 18, 2015 · US
US11734788B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11734788-B2 |
| Application number | US-202117515278-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 29, 2021 |
| Priority date | Dec 18, 2013 |
| Publication date | Aug 22, 2023 |
| Grant date | Aug 22, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.
Opening claim text (preview).
What is claimed is: 1. A single instruction multiple data (SIMD) processing unit configured to process a plurality of tasks which each include up to a predetermined maximum number of work items, wherein the work items of a task are arranged for executing a common sequence of instructions on respective data items, wherein blocks of work items within a task relate to respective blocks of data items, each block of data items being a pixel quad, the SIMD processing unit comprising: a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles, wherein each of the processing lanes of the group is configured to execute instructions of a respective block of work items over a plurality of consecutive processing cycles; and logic coupled to the group of processing lanes configured to cause the group of processing lanes to skip a particular processing cycle in response to making a determination that there are no valid work items scheduled for execution over the group of processing lanes in the particular processing cycle. 2. The SIMD processing unit of claim 1 , wherein the logic is configured to set indicators to indicate how the work items have been assembled into the tasks. 3. The SIMD processing unit of claim 2 , further comprising: a store configured to store the processed data items output from the group of processing lanes; and storing logic configured to determine addresses for storing the processed data items in the store based on the indicators. 4. The SIMD processing unit of claim 1 , wherein the logic is configured to assemble the work items into the tasks such that work items of a block of work items are grouped together into the same task. 5. The SIMD processing unit of claim 1 , wherein the work items are assembled into blocks of work items such that each work item within a block of work items can be used to perform a pre-processing operation on the block of work items before it is passed to the group of processing lanes. 6. The SIMD processing unit of claim 5 , wherein the pre-processing operation is a gradient operation configured to determine the rate of change of a varying quantity between different pixels in a pixel quad. 7. The SIMD processing unit of claim 1 , wherein there are no valid work items scheduled for execution over the group of processing lanes in the particular processing cycle if all of the work items which are scheduled for execution over the group of processing lanes in the particular processing cycle are invalid work items. 8. The SIMD processing unit of claim 1 , wherein there is not a valid work item scheduled for execution in a processing lane in a particular processing cycle if there is not a work item which is scheduled for execution in the processing lane in the particular processing cycle. 9. The SIMD processing unit of claim 8 , wherein work items which are not ready for execution when the task is due to be sent to the group of parallel processing lanes are not scheduled for execution. 10. The SIMD processing unit of claim 1 , wherein some of the tasks comprise fewer than the predetermined maximum number of work items, and wherein the SIMD processing unit comprises a plurality of parallel groups of processing lanes, each group being configured to execute instructions of work items of a respective task over a plurality of processing cycles. 11. The SIMD processing unit of claim 10 , wherein the logic coupled to the groups of processing lanes is further configured to cause a particular group of processing lanes to skip a particular processing cycle, independently of the other groups of processing lanes, if there are no valid work items scheduled for execution in any of the processing lanes of the particular group in the particular processing cycle. 12. The SIMD processing unit of claim 11 , wherein the logic is configured to cause a particular group of processing lanes to skip a particular processing cycle whilst work items are scheduled to execute in the other groups of processing lanes in the particular processing cycle. 13. The SIMD processing unit of claim 1 , wherein there are three levels of validity for pixels of a pixel quad, a first level of validity being full validity, a second level of validity being partial invalidity and a third level of validity being full invalidity, and wherein the logic is configured to: skip a first particular processing cycle comprising work items corresponding to pixels of the third level of validity when instructions are to be executed on pixels of the first and second levels of validity, but instructions are not to be executed on pixels of the third level of validity; and skip a second particular processing cycle comprising work items corresponding to pixels of the second level of validity when instructions are to be executed on pixels of the first level of validity, but instructions are not to be executed on pixels of the second level of validity. 14. The SIMD processing unit of claim 13 , wherein the instructions to be executed on pixels of the first and second levels of validity form part of a texturing operation to be performed on a particular pixel of a pixel quad, and wherein the particular pixel is of the first level of validity, the neighbouring pixels in the pixel quad excluding the diagonal neighbour of the particular pixel are of the second level of validity, and the diagonal neighbour in the pixel quad is of the third level of validity. 15. A method of using a single instruction multiple data (SIMD) processing unit to process a plurality of tasks which each include up to a predetermined maximum number of work items, wherein the work items of a task are arranged for executing a common sequence of instructions on respective data items, wherein blocks of work items within a task relate to respective blocks of data items, each block of data items being a pixel quad, wherein the SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles, the method comprising: executing instructions of work items of the particular task using the group of processing lanes, wherein each of the processing lanes of the group executes instructions of a respective block of work items over a plurality of consecutive processing cycles; and causing the group of processing lanes to skip a particular processing cycle in response to making a determination that there are no valid work items scheduled for execution over the group of processing lanes in the particular processing cycle. 16. The method of claim 15 , further comprising setting indicators to indicate how the work items have been assembled into the tasks. 17. The method of claim 16 , further comprising: determining addresses for storing the processed data items in a store based on the indicators; and storing, at the determined addresses in the store, the processed data items output from the group of processing lanes. 18. The method of claim 15 , wherein said assembling the work items into the tasks comprises grouping work items of a block of work items relating to a pixel quad together into the same task. 19. The method of claim 15 , wherein some of the tasks comprise fewer than the predetermined maximum number of work items, and wherein the SIMD processing unit comprises a plurality of parallel groups of processing lanes, the method comprising: executing, at each group of processing lanes, instructions of work items of a respective task over a plural
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
using a mask · CPC title
Divergence aspects · CPC title
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
Processor architectures; Processor configuration, e.g. pipelining · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.