Graphics processing unit processing and caching improvements

US12572997B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12572997-B2
Application numberUS-202318305904-A
CountryUS
Kind codeB2
Filing dateApr 24, 2023
Priority dateNov 15, 2019
Publication dateMar 10, 2026
Grant dateMar 10, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments described herein are generally directed to improvements relating to power, latency, bandwidth and/or performance issues relating to GPU processing/caching. According to one embodiment, a state of multiple intellectual property (IP) cores that have access to a common cache via a central fabric is observed. Responsive to the observed state being indicative of performance of a standalone workload by a first IP core of the multiple IP cores, the common cache is treated as a local cache of the first IP core by powering off the central fabric and causing the first IP core to access the common cache via a low power access path between the first IP core and the common cache that is outside of the central fabric.

First claim

Opening claim text (preview).

What is claimed is: 1 . A graphics processing unit (GPU) comprising: a fabric; a cache coupled to the fabric; a plurality of intellectual property (IP) cores coupled to the cache via the fabric; a low-power access path outside of the fabric coupling a first IP core of the plurality of IP cores to the cache; and wherein the fabric is operable to be selectively powered off or powered on depending upon (i) a status of each of the plurality of IP cores or (ii) a workload being processed by the first IP core. 2 . The GPU of claim 1 , the fabric is powered off when the status of the first IP core is active and the status of all other of the plurality of IP cores is inactive or when the workload comprises a standalone workload that does not involve communication between the first IP core and any other of the plurality of IP cores. 3 . The GPU of claim 2 , wherein first IP core comprises a media IP core. 4 . The GPU of claim 3 , wherein the standalone workload comprises media decoding. 5 . The GPU of claim 3 , wherein the standalone workload comprises media encoding. 6 . The GPU of claim 3 , wherein the standalone workload comprises media transcoding. 7 . A method comprising: determining, by a graphics processing unit (GPU), a state of a plurality of intellectual property (IP) cores that have access to a common cache via a central fabric, wherein the state is indicative of performance of a standalone workload by a first IP core of the plurality of IP cores; and treating the common cache as a local cache of the first IP core by powering off the central fabric and causing the first IP core to access the common cache via a low-power access path between the first IP core and the common cache, wherein the low-power access path is outside of the central fabric. 8 . The method of claim 7 , wherein the standalone workload does not involve communication between the first IP core and any other of the plurality of IP cores. 9 . The method of claim 7 , wherein the first IP core comprises a media IP core and wherein the standalone workload comprises media decoding, media encoding or media transcoding. 10 . The method of claim 7 , wherein the state comprises the first IP core being active and all other IP cores of the plurality of IP cores being inactive. 11 . A graphics processing unit (GPU) comprising: a plurality of cache banks; a plurality of shader modules coupled to the plurality of cache banks; and wherein each cache bank of the plurality of cache banks is reconfigurable to operate as part of a global last level cache or as a local cache for a particular shader module of the plurality of shader modules based on at least one of a workload demand on the particular shader module and a distance between the cache bank and the particular shader module. 12 . The GPU of claim 11 , wherein the GPU comprises a die-stacked GPU including one or more top dies and a base die. 13 . The GPU of claim 12 , wherein the one or more top dies include a plurality of chiplets containing the shader modules. 14 . The GPU of claim 12 , wherein the base die includes the plurality of cache banks. 15 . The GPU of claim 11 , wherein the plurality of cache banks comprise level 2 (L2) cache banks. 16 . The GPU of claim 11 , wherein the plurality of cache banks comprise level 3 (L3) cache banks.

Assignees

Inventors

Classifications

  • Inference or reasoning models · CPC title

  • Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches · CPC title

  • Local memory within processor subsystem · CPC title

  • G06T1/20Primary

    Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12572997B2 cover?
Embodiments described herein are generally directed to improvements relating to power, latency, bandwidth and/or performance issues relating to GPU processing/caching. According to one embodiment, a state of multiple intellectual property (IP) cores that have access to a common cache via a central fabric is observed. Responsive to the observed state being indicative of performance of a standalo…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 10 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).