Method and apparatus for parallel pixel shading
US-2015348222-A1 · Dec 3, 2015 · US
US9799094B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9799094-B1 |
| Application number | US-201615162198-A |
| Country | US |
| Kind code | B1 |
| Filing date | May 23, 2016 |
| Priority date | May 23, 2016 |
| Publication date | Oct 24, 2017 |
| Grant date | Oct 24, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for processing data in a graphics processing unit (GPU) including receiving an instance identifier for an instance and a shader program comprising a preamble code block and a main shader code block, assigning, the instance identifier to a general purpose register at wave creation, allocating address space within the constant memory for instance uniforms, and determining the preamble code block has not been executed and the wave is a first wave of the instance to be executed, based on determining the preamble code block has not been executed and the wave is the first wave to be executed, executing the preamble code block to store the plurality of instance uniforms in the constant memory and based, at least in part, on executing the preamble code block, executing the wave of the plurality of waves using at least one of the plurality of instance constants stored inconstant memory.
Opening claim text (preview).
What is claimed is: 1. A method of operating a graphic processing unit (GPU), the method comprising: receiving, by the GPU from a shader compiler, an instance identifier for an instance and a shader program, the shader program comprising a preamble code block and a main shader code block, the preamble code block being executable to store a plurality of instance uniforms in a constant memory; assigning, by the GPU, the instance identifier to a general purpose register at a creation of a wave of a plurality of waves; allocating, by the GPU, address space within the constant memory for the plurality of instance uniforms; determining, by the GPU, the preamble code block has not been executed and the wave is a first wave of the instance to be executed; based, at least in part, on determining the preamble code block has not been executed and the wave is the first wave to be executed, executing, by the GPU, the preamble code block to store the plurality of instance uniforms in the constant memory; and based, at least in part, on executing the preamble code block, executing, by the GPU, the main shader code block for the wave of the plurality of waves using at least one of the plurality of instance constants stored in the constant memory. 2. The method of claim 1 , further comprising: assigning, by the GPU, the instance identifier to the general purpose register at a creation of a second wave of the plurality of waves. 3. The method of claim 2 , further comprising: determining, by the GPU, the second wave is not the first wave; based, at least in part, on determining the second wave is not the first wave and determining the preamble block has not been executed, waiting for the preamble block to complete execution before executing the main shader code block for the second wave. 4. The method of claim 2 , further comprising: determining, by the GPU, the second wave is not the first wave; based, at least in part, on determining the second wave is not the first wave and determining the preamble block has been executed, executing, by the GPU, the main shader code block for the second wave of the plurality of waves using instance constants stored in the constant memory. 5. The method of claim 1 , wherein, the constant memory comprises a wrap-around ring buffer and storage of the instance uniforms in the constant memory comprises storage of the instance uniforms in the wrap-around ring buffer. 6. The method of claim 5 , further comprising storing, in a uniform general purpose register, an instance offset in the wrap-around ring buffer, the instance offset configured to locate the plurality of instance uniforms in the constant memory of the instance. 7. The method of claim 1 , wherein executing, by the GPU, the preamble code block to store the plurality instance uniforms in the constant memory further comprises: determining a source address of an instance uniform of the plurality of instance uniforms in a uniform buffer object based on the instance identifier and a number uniforms in the instance; determining a destination address of the instance uniform of the plurality of instance uniforms in the constant memory; based, at least in part, on the determined source address and the determined destination address, storing the instance uniform of the plurality of instance uniforms. 8. The method of claim 1 , wherein, determining, by the GPU, the preamble code block has not been executed is based on the value of a flag being false. 9. The method of claim 1 , wherein, executing the preamble code block comprises: executing a per_instance_preamble_start instruction configured to delineate a start of the preamble code block; and executing a per_instance_preamble_end instruction configured to delineate an end of the preamble code block. 10. A device for processing data, the device comprising: a graphics processing unit (GPU), the GPU comprising a constant memory and a shader core, the shader core comprising a control unit, a plurality of processing elements, and a general purpose register (GPR), wherein the control unit is configured to: receive, from a shader compiler, an instance identifier for an instance and a shader program, the shader program comprising a preamble code block and a main shader code block, the preamble code block being executable to store a plurality of instance uniforms in the constant memory; assign the instance identifier to the GPR at a creation of a wave of a plurality of waves; allocate address space within the constant memory for the plurality of instance uniforms; determine the preamble code block has not been executed and the wave is a first wave of the instance to be executed; based, at least in part, on the determination that the preamble code block has not been executed and the wave is the first wave to be executed, direct at least one of the plurality of processing elements to execute the preamble code block to store the plurality of instance uniforms in the constant memory; and based, at least in part, on the execution of the preamble code block, direct at least one of the plurality of processing elements to execute the main shader code block for the wave of the plurality of waves using at least one of the plurality of instance constants stored in the constant memory. 11. The device of claim 10 , wherein the control unit is further configured to: assign the instance identifier to the general purpose register at a creation of a second wave of the plurality of waves. 12. The device of claim 11 , wherein the control unit is further configured to: determine the second wave is not the first wave; based, at least in part, on determining the second wave is not the first wave and determining the preamble block has not been executed, wait for the preamble block to complete execution before execution of the main shader code block for the second wave. 13. The device of claim 11 , wherein the control unit is further configured to: determine the second wave is not the first wave; based, at least in part, on determining the second wave is not the first wave and determining the preamble block has been executed, execute the main shader code block for the second wave of the plurality of waves using instance constants stored in the constant memory. 14. The device of claim 10 wherein, the constant memory comprises a wrap-around ring buffer and storage of the instance uniforms in the constant memory comprises storage of the instance uniforms in the wrap-around ring buffer. 15. The device of claim 14 , wherein: the shader core further comprises a uniform general purpose register, and the control unit is further configured store an instance offset in the wrap-around ring buffer, the instance offset configured to locate the plurality of instance uniforms in the constant memory of the instance. 16. The device of claim 10 , wherein the control unit configured to execute the preamble code block to store the plurality instance uniforms in the constant memory further comprises the control unit configured to: determine a source address of an instance uniform of the plurality of instance uniforms in a uniform buffer object based on the instance identifier and a number uniforms in the instance; determine a destination address of the instance uniform of the plurality of instance uniforms in the constant memory; based, at least in part, on the determined source address and the determined destination address, store the instance uniform of the plurality of instance uniforms. 17. The device of claim 10 , wherein the control unit configured to determine the
Memory management · CPC title
General purpose rendering architectures · CPC title
Reducing the number of cache misses; Data prefetching (cache prefetching G06F12/0862) · CPC title
Image coding (bandwidth or redundancy reduction for static pictures H04N1/41; coding or decoding of static colour picture signals H04N1/64; methods or arrangements for coding, decoding, compressing or decompressing digital video signals H04N19/00) · CPC title
Processor architectures; Processor configuration, e.g. pipelining · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.