Techniques to manage execution of divergent shaders
US-2022068005-A1 · Mar 3, 2022 · US
US12354205B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12354205-B2 |
| Application number | US-202117484060-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 24, 2021 |
| Priority date | Sep 24, 2021 |
| Publication date | Jul 8, 2025 |
| Grant date | Jul 8, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments described herein are generally directed to a local cache structure within a shared function of a 3D pipeline that facilitates efficient caching of resource state. In an example, the cache structure is maintained within a sub-core of a GPU. The local cache structure includes (i) an SC having entries each containing a state of a binded resource, and (ii) a DSAT having entries each containing an index into the SC. The DSAT is tagged by SBTO values representing addresses of entries of a binding table. A request, including information indicative of an SBTO pointing to an entry within the binding table, is received for a state of a particular binded resource being accessed by a shared function of the 3D pipeline. Based on the SBTO and during a single access to the cache structure, a determination is made regarding whether the state of the particular binded resource is present.
Opening claim text (preview).
What is claimed is: 1. A graphics processing unit (GPU) comprising: a three-dimensional (3D) pipeline operable to perform 3D operations; a sub-core operable to maintain a cache structure including (i) a state cache (SC) having a plurality of SC entries each containing a state of a binded resource, and (ii) a direct state access table (DSAT) having a plurality of DSAT entries each containing a state location identifier (SLID) representing an index into the SC, wherein the DSAT comprises a content-addressable memory (CAM) tagged by state binding table offset (SBTO) values representing addresses of entries of a binding table stream associated with the 3D pipeline and stored within a memory subsystem associated with the GPU; and a shared function operable to issue a request to the cache structure for a state of a particular binded resource being accessed by the shared function, wherein the request includes information indicative of an SBTO pointing to an entry within the binding table stream; and wherein responsive to the request the cache structure is operable to make a determination regarding whether the state of the particular binded resource is present within the cache structure during a single access to the cache structure based on the SBTO. 2. The GPU of claim 1 , wherein responsive to the determination being affirmative, the DSAT is operable to cause the SC to output the state of the particular binded resource by indexing the SC based on the SLID of a particular DSAT entry of the plurality of DSAT entries that matched the SBTO. 3. The GPU of claim 1 , wherein responsive to the SBTO representing a cache hit within the DSAT and the determination being negative, the SC is operable to allocate a new SC entry of the plurality of SC entries and issue a request to the memory subsystem for the state of the particular binded resource. 4. The GPU of claim 1 , wherein the cache structure further includes a binding table cache (BTC) including a plurality of BTC entries each containing a cacheline of state offsets cached from the binding table stream, wherein the BTC comprises a CAM tagged by a portion of the SBTO and wherein responsive to the SBTO representing a cache miss within the DSAT, the DSAT is further operable to allocate a new DSAT entry of the plurality of DSAT entries. 5. The GPU of claim 4 , wherein the SC comprises a CAM tagged by a state offset field and wherein responsive to the SBTO representing a cache hit within the BTC, the DSAT is further operable to causing the BTC to return a state offset associated with the SBTO from the cacheline of state offsets stored within a particular BTC entry of the plurality of BTC entries that matched the portion of the SBTO and to attempt to locate the state of the particular binded resource within the SC based on the state offset. 6. The GPU of claim 4 , wherein the cache structure further includes an out-of-order (OOO) tracking table including a plurality of tracking entries each containing a pointer to a particular BTC entry of the plurality of BTC entries and an offset representing a selector among the cacheline of state offsets contained within the particular BTC entry and wherein responsive to the SBTO representing a cache miss within the BTC, the BTC is operable to: allocate a new BTC entry of the plurality of BTC entries; issue a request to the memory subsystem for the cacheline of state offsets associated with the new BTC entry; and facilitate out-of-order processing of outstanding requests to the memory subsystem by causing the OOO tracking table to allocate a new pending tracking entry of the plurality of tracking entries. 7. The GPU of claim 1 , wherein the shared function comprises a texture sampler. 8. A method comprising: maintaining locally within a sub-core of a graphics processing unit (GPU) a cache structure including (i) a state cache (SC) having a plurality of SC entries each containing a state of a binded resource, and (ii) a direct state access table (DSAT) having a plurality of DSAT entries each containing a state location identifier (SLID) representing an index into the SC, wherein the DSAT comprises a content-addressable memory (CAM) tagged by state binding table offset (SBTO) values representing addresses of entries of a binding table stream associated with a three dimensional (3D) pipeline of the GPU and stored within a memory subsystem associated with the GPU; receiving a request for a state of a particular binded resource being accessed by a shared function of the 3D pipeline, wherein the request includes information indicative of an SBTO pointing to an entry within the binding table stream; and determining whether the state of the particular binded resource is present within the cache structure during a single access to the cache structure based on the SBTO. 9. The method of claim 8 , further comprising responsive to said determining being affirmative, causing the SC to output the state of the particular binded resource by indexing the SC based on the SLID of a particular DSAT entry of the plurality of DSAT entries that matched the SBTO. 10. The method of claim 8 , further comprising responsive to the SBTO representing a cache hit within the DSAT and said determining being negative: allocating a new SC entry of the plurality of SC entries; and issuing a request to the memory subsystem for the state of the particular binded resource. 11. The method of claim 8 , wherein the cache structure further includes a binding table cache (BTC) including a plurality of BTC entries each containing a cacheline of state offsets cached from the binding table stream, wherein the BTC comprises a CAM tagged by a portion of the SBTO and wherein the method further comprises responsive to the SBTO representing a cache miss within the DSAT allocating a new DSAT entry of the plurality of DSAT entries. 12. The method of claim 11 , wherein the SC comprises a CAM tagged by a state offset field and wherein the method further comprises responsive to the SBTO representing a cache hit within the BTC: causing the BTC to return a state offset associated with the SBTO from the cacheline of state offsets stored within a particular BTC entry of the plurality of BTC entries that matched the portion of the SBTO; and attempting to locate the state of the particular binded resource within the SC based on the state offset. 13. The method of claim 11 , wherein the cache structure further includes an out-of-order (OOO) tracking table including a plurality of tracking entries each containing a pointer to a particular BTC entry of the plurality of BTC entries and an offset representing a selector among the cacheline of state offsets contained within the particular BTC entry and wherein the method further comprises responsive to the SBTO representing a cache miss within the BTC: allocating a new BTC entry of the plurality of BTC entries; issuing a request to the memory subsystem for the cacheline of state offsets associated with the new BTC entry; and facilitating out-of-order processing of outstanding requests to the memory subsystem by allocating a new pending tracking entry of the plurality of tracking entries. 14. The method of claim 8 , wherein the shared function comprises a texture sampler. 15. A graphics resource cache for a shared function of a three-dimensional (3D) pipeline of a graphics processing unit, the graphics resource cache comprising: a state cache (SC) having a plurality of SC entries each containing a state of a binded resource; and a direct state access table (DSAT) having a plurality of DSAT entries each containing a state location identifier (SLID) represe
of parts of caches, e.g. directory or tag array · CPC title
Memory management · CPC title
using instruction pipelines · CPC title
from multiple instruction streams, e.g. multistreaming · CPC title
General purpose rendering architectures · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.