Per-instance preamble for graphics processing

US9799094B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9799094-B1
Application numberUS-201615162198-A
CountryUS
Kind codeB1
Filing dateMay 23, 2016
Priority dateMay 23, 2016
Publication dateOct 24, 2017
Grant dateOct 24, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for processing data in a graphics processing unit (GPU) including receiving an instance identifier for an instance and a shader program comprising a preamble code block and a main shader code block, assigning, the instance identifier to a general purpose register at wave creation, allocating address space within the constant memory for instance uniforms, and determining the preamble code block has not been executed and the wave is a first wave of the instance to be executed, based on determining the preamble code block has not been executed and the wave is the first wave to be executed, executing the preamble code block to store the plurality of instance uniforms in the constant memory and based, at least in part, on executing the preamble code block, executing the wave of the plurality of waves using at least one of the plurality of instance constants stored inconstant memory.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of operating a graphic processing unit (GPU), the method comprising: receiving, by the GPU from a shader compiler, an instance identifier for an instance and a shader program, the shader program comprising a preamble code block and a main shader code block, the preamble code block being executable to store a plurality of instance uniforms in a constant memory; assigning, by the GPU, the instance identifier to a general purpose register at a creation of a wave of a plurality of waves; allocating, by the GPU, address space within the constant memory for the plurality of instance uniforms; determining, by the GPU, the preamble code block has not been executed and the wave is a first wave of the instance to be executed; based, at least in part, on determining the preamble code block has not been executed and the wave is the first wave to be executed, executing, by the GPU, the preamble code block to store the plurality of instance uniforms in the constant memory; and based, at least in part, on executing the preamble code block, executing, by the GPU, the main shader code block for the wave of the plurality of waves using at least one of the plurality of instance constants stored in the constant memory. 2. The method of claim 1 , further comprising: assigning, by the GPU, the instance identifier to the general purpose register at a creation of a second wave of the plurality of waves. 3. The method of claim 2 , further comprising: determining, by the GPU, the second wave is not the first wave; based, at least in part, on determining the second wave is not the first wave and determining the preamble block has not been executed, waiting for the preamble block to complete execution before executing the main shader code block for the second wave. 4. The method of claim 2 , further comprising: determining, by the GPU, the second wave is not the first wave; based, at least in part, on determining the second wave is not the first wave and determining the preamble block has been executed, executing, by the GPU, the main shader code block for the second wave of the plurality of waves using instance constants stored in the constant memory. 5. The method of claim 1 , wherein, the constant memory comprises a wrap-around ring buffer and storage of the instance uniforms in the constant memory comprises storage of the instance uniforms in the wrap-around ring buffer. 6. The method of claim 5 , further comprising storing, in a uniform general purpose register, an instance offset in the wrap-around ring buffer, the instance offset configured to locate the plurality of instance uniforms in the constant memory of the instance. 7. The method of claim 1 , wherein executing, by the GPU, the preamble code block to store the plurality instance uniforms in the constant memory further comprises: determining a source address of an instance uniform of the plurality of instance uniforms in a uniform buffer object based on the instance identifier and a number uniforms in the instance; determining a destination address of the instance uniform of the plurality of instance uniforms in the constant memory; based, at least in part, on the determined source address and the determined destination address, storing the instance uniform of the plurality of instance uniforms. 8. The method of claim 1 , wherein, determining, by the GPU, the preamble code block has not been executed is based on the value of a flag being false. 9. The method of claim 1 , wherein, executing the preamble code block comprises: executing a per_instance_preamble_start instruction configured to delineate a start of the preamble code block; and executing a per_instance_preamble_end instruction configured to delineate an end of the preamble code block. 10. A device for processing data, the device comprising: a graphics processing unit (GPU), the GPU comprising a constant memory and a shader core, the shader core comprising a control unit, a plurality of processing elements, and a general purpose register (GPR), wherein the control unit is configured to: receive, from a shader compiler, an instance identifier for an instance and a shader program, the shader program comprising a preamble code block and a main shader code block, the preamble code block being executable to store a plurality of instance uniforms in the constant memory; assign the instance identifier to the GPR at a creation of a wave of a plurality of waves; allocate address space within the constant memory for the plurality of instance uniforms; determine the preamble code block has not been executed and the wave is a first wave of the instance to be executed; based, at least in part, on the determination that the preamble code block has not been executed and the wave is the first wave to be executed, direct at least one of the plurality of processing elements to execute the preamble code block to store the plurality of instance uniforms in the constant memory; and based, at least in part, on the execution of the preamble code block, direct at least one of the plurality of processing elements to execute the main shader code block for the wave of the plurality of waves using at least one of the plurality of instance constants stored in the constant memory. 11. The device of claim 10 , wherein the control unit is further configured to: assign the instance identifier to the general purpose register at a creation of a second wave of the plurality of waves. 12. The device of claim 11 , wherein the control unit is further configured to: determine the second wave is not the first wave; based, at least in part, on determining the second wave is not the first wave and determining the preamble block has not been executed, wait for the preamble block to complete execution before execution of the main shader code block for the second wave. 13. The device of claim 11 , wherein the control unit is further configured to: determine the second wave is not the first wave; based, at least in part, on determining the second wave is not the first wave and determining the preamble block has been executed, execute the main shader code block for the second wave of the plurality of waves using instance constants stored in the constant memory. 14. The device of claim 10 wherein, the constant memory comprises a wrap-around ring buffer and storage of the instance uniforms in the constant memory comprises storage of the instance uniforms in the wrap-around ring buffer. 15. The device of claim 14 , wherein: the shader core further comprises a uniform general purpose register, and the control unit is further configured store an instance offset in the wrap-around ring buffer, the instance offset configured to locate the plurality of instance uniforms in the constant memory of the instance. 16. The device of claim 10 , wherein the control unit configured to execute the preamble code block to store the plurality instance uniforms in the constant memory further comprises the control unit configured to: determine a source address of an instance uniform of the plurality of instance uniforms in a uniform buffer object based on the instance identifier and a number uniforms in the instance; determine a destination address of the instance uniform of the plurality of instance uniforms in the constant memory; based, at least in part, on the determined source address and the determined destination address, store the instance uniform of the plurality of instance uniforms. 17. The device of claim 10 , wherein the control unit configured to determine the

Assignees

Inventors

Classifications

  • G06T1/60Primary

    Memory management · CPC title

  • G06T15/005Primary

    General purpose rendering architectures · CPC title

  • Reducing the number of cache misses; Data prefetching (cache prefetching G06F12/0862) · CPC title

  • Image coding (bandwidth or redundancy reduction for static pictures H04N1/41; coding or decoding of static colour picture signals H04N1/64; methods or arrangements for coding, decoding, compressing or decompressing digital video signals H04N19/00) · CPC title

  • Processor architectures; Processor configuration, e.g. pipelining · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9799094B1 cover?
A method for processing data in a graphics processing unit (GPU) including receiving an instance identifier for an instance and a shader program comprising a preamble code block and a main shader code block, assigning, the instance identifier to a general purpose register at wave creation, allocating address space within the constant memory for instance uniforms, and determining the preamble co…
Who is the assignee on this patent?
Qualcomm Inc
What technology area does this patent fall under?
Primary CPC classification G06T1/60. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 24 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).