Efficient caching of resource state for a shared function of a three-dimensional pipeline of a graphics processing unit

US12354205B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12354205-B2
Application numberUS-202117484060-A
CountryUS
Kind codeB2
Filing dateSep 24, 2021
Priority dateSep 24, 2021
Publication dateJul 8, 2025
Grant dateJul 8, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments described herein are generally directed to a local cache structure within a shared function of a 3D pipeline that facilitates efficient caching of resource state. In an example, the cache structure is maintained within a sub-core of a GPU. The local cache structure includes (i) an SC having entries each containing a state of a binded resource, and (ii) a DSAT having entries each containing an index into the SC. The DSAT is tagged by SBTO values representing addresses of entries of a binding table. A request, including information indicative of an SBTO pointing to an entry within the binding table, is received for a state of a particular binded resource being accessed by a shared function of the 3D pipeline. Based on the SBTO and during a single access to the cache structure, a determination is made regarding whether the state of the particular binded resource is present.

First claim

Opening claim text (preview).

What is claimed is: 1. A graphics processing unit (GPU) comprising: a three-dimensional (3D) pipeline operable to perform 3D operations; a sub-core operable to maintain a cache structure including (i) a state cache (SC) having a plurality of SC entries each containing a state of a binded resource, and (ii) a direct state access table (DSAT) having a plurality of DSAT entries each containing a state location identifier (SLID) representing an index into the SC, wherein the DSAT comprises a content-addressable memory (CAM) tagged by state binding table offset (SBTO) values representing addresses of entries of a binding table stream associated with the 3D pipeline and stored within a memory subsystem associated with the GPU; and a shared function operable to issue a request to the cache structure for a state of a particular binded resource being accessed by the shared function, wherein the request includes information indicative of an SBTO pointing to an entry within the binding table stream; and wherein responsive to the request the cache structure is operable to make a determination regarding whether the state of the particular binded resource is present within the cache structure during a single access to the cache structure based on the SBTO. 2. The GPU of claim 1 , wherein responsive to the determination being affirmative, the DSAT is operable to cause the SC to output the state of the particular binded resource by indexing the SC based on the SLID of a particular DSAT entry of the plurality of DSAT entries that matched the SBTO. 3. The GPU of claim 1 , wherein responsive to the SBTO representing a cache hit within the DSAT and the determination being negative, the SC is operable to allocate a new SC entry of the plurality of SC entries and issue a request to the memory subsystem for the state of the particular binded resource. 4. The GPU of claim 1 , wherein the cache structure further includes a binding table cache (BTC) including a plurality of BTC entries each containing a cacheline of state offsets cached from the binding table stream, wherein the BTC comprises a CAM tagged by a portion of the SBTO and wherein responsive to the SBTO representing a cache miss within the DSAT, the DSAT is further operable to allocate a new DSAT entry of the plurality of DSAT entries. 5. The GPU of claim 4 , wherein the SC comprises a CAM tagged by a state offset field and wherein responsive to the SBTO representing a cache hit within the BTC, the DSAT is further operable to causing the BTC to return a state offset associated with the SBTO from the cacheline of state offsets stored within a particular BTC entry of the plurality of BTC entries that matched the portion of the SBTO and to attempt to locate the state of the particular binded resource within the SC based on the state offset. 6. The GPU of claim 4 , wherein the cache structure further includes an out-of-order (OOO) tracking table including a plurality of tracking entries each containing a pointer to a particular BTC entry of the plurality of BTC entries and an offset representing a selector among the cacheline of state offsets contained within the particular BTC entry and wherein responsive to the SBTO representing a cache miss within the BTC, the BTC is operable to: allocate a new BTC entry of the plurality of BTC entries; issue a request to the memory subsystem for the cacheline of state offsets associated with the new BTC entry; and facilitate out-of-order processing of outstanding requests to the memory subsystem by causing the OOO tracking table to allocate a new pending tracking entry of the plurality of tracking entries. 7. The GPU of claim 1 , wherein the shared function comprises a texture sampler. 8. A method comprising: maintaining locally within a sub-core of a graphics processing unit (GPU) a cache structure including (i) a state cache (SC) having a plurality of SC entries each containing a state of a binded resource, and (ii) a direct state access table (DSAT) having a plurality of DSAT entries each containing a state location identifier (SLID) representing an index into the SC, wherein the DSAT comprises a content-addressable memory (CAM) tagged by state binding table offset (SBTO) values representing addresses of entries of a binding table stream associated with a three dimensional (3D) pipeline of the GPU and stored within a memory subsystem associated with the GPU; receiving a request for a state of a particular binded resource being accessed by a shared function of the 3D pipeline, wherein the request includes information indicative of an SBTO pointing to an entry within the binding table stream; and determining whether the state of the particular binded resource is present within the cache structure during a single access to the cache structure based on the SBTO. 9. The method of claim 8 , further comprising responsive to said determining being affirmative, causing the SC to output the state of the particular binded resource by indexing the SC based on the SLID of a particular DSAT entry of the plurality of DSAT entries that matched the SBTO. 10. The method of claim 8 , further comprising responsive to the SBTO representing a cache hit within the DSAT and said determining being negative: allocating a new SC entry of the plurality of SC entries; and issuing a request to the memory subsystem for the state of the particular binded resource. 11. The method of claim 8 , wherein the cache structure further includes a binding table cache (BTC) including a plurality of BTC entries each containing a cacheline of state offsets cached from the binding table stream, wherein the BTC comprises a CAM tagged by a portion of the SBTO and wherein the method further comprises responsive to the SBTO representing a cache miss within the DSAT allocating a new DSAT entry of the plurality of DSAT entries. 12. The method of claim 11 , wherein the SC comprises a CAM tagged by a state offset field and wherein the method further comprises responsive to the SBTO representing a cache hit within the BTC: causing the BTC to return a state offset associated with the SBTO from the cacheline of state offsets stored within a particular BTC entry of the plurality of BTC entries that matched the portion of the SBTO; and attempting to locate the state of the particular binded resource within the SC based on the state offset. 13. The method of claim 11 , wherein the cache structure further includes an out-of-order (OOO) tracking table including a plurality of tracking entries each containing a pointer to a particular BTC entry of the plurality of BTC entries and an offset representing a selector among the cacheline of state offsets contained within the particular BTC entry and wherein the method further comprises responsive to the SBTO representing a cache miss within the BTC: allocating a new BTC entry of the plurality of BTC entries; issuing a request to the memory subsystem for the cacheline of state offsets associated with the new BTC entry; and facilitating out-of-order processing of outstanding requests to the memory subsystem by allocating a new pending tracking entry of the plurality of tracking entries. 14. The method of claim 8 , wherein the shared function comprises a texture sampler. 15. A graphics resource cache for a shared function of a three-dimensional (3D) pipeline of a graphics processing unit, the graphics resource cache comprising: a state cache (SC) having a plurality of SC entries each containing a state of a binded resource; and a direct state access table (DSAT) having a plurality of DSAT entries each containing a state location identifier (SLID) represe

Assignees

Inventors

Classifications

  • of parts of caches, e.g. directory or tag array · CPC title

  • Memory management · CPC title

  • using instruction pipelines · CPC title

  • from multiple instruction streams, e.g. multistreaming · CPC title

  • G06T15/005Primary

    General purpose rendering architectures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12354205B2 cover?
Embodiments described herein are generally directed to a local cache structure within a shared function of a 3D pipeline that facilitates efficient caching of resource state. In an example, the cache structure is maintained within a sub-core of a GPU. The local cache structure includes (i) an SC having entries each containing a state of a binded resource, and (ii) a DSAT having entries each con…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06T15/005. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 08 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).