Method and system for work scheduling in a multi-chip system
US-2015254104-A1 · Sep 10, 2015 · US
US12554674B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12554674-B2 |
| Application number | US-202418915492-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 15, 2024 |
| Priority date | Mar 15, 2019 |
| Publication date | Feb 17, 2026 |
| Grant date | Feb 17, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and apparatus relating to techniques for multi-tile memory management. In an example, a graphics processor includes an interposer, a first chiplet coupled with the interposer, the first chiplet including a graphics processing resource and an interconnect network coupled with the graphics processing resource, cache circuitry coupled with the graphics processing resource via the interconnect network, and a second chiplet coupled with the first chiplet via the interposer, the second chiplet including a memory-side cache and a memory controller coupled with the memory-side cache. The memory controller is configured to enable access to a high-bandwidth memory (HBM) device, the memory-side cache is configured to cache data associated with a memory access performed via the memory controller, and the cache circuitry is logically positioned between the graphics processing resource and a chiplet interface.
Opening claim text (preview).
What is claimed is: 1 . A graphics processing unit comprising: an interposer; a first chiplet coupled with the interposer, the first chiplet including graphics processing circuitry and a switched interconnect network coupled with the graphics processing circuitry; cache circuitry coupled with the graphics processing circuitry via the switched interconnect network; and a second chiplet coupled with the first chiplet via the interposer, the second chiplet including a memory-side cache and a memory controller coupled with the memory-side cache, wherein the memory controller is configured to enable access to a high-bandwidth memory (HBM) device, the memory-side cache is configured to cache data associated with a memory access performed via the memory controller, and the cache circuitry is logically positioned between the graphics processing circuitry and a chiplet interface. 2 . The graphics processing unit of claim 1 , wherein the cache circuitry includes a level-2 (L2) cache. 3 . The graphics processing unit of claim 2 , wherein the L2 cache is a distributed cache having nodes interconnected via the switched interconnect network. 4 . The graphics processing unit of claim 1 , wherein the memory-side cache includes a level-3 (L3) cache. 5 . The graphics processing unit of claim 1 , wherein the interposer, first chiplet, and second chiplet have a 2.5 dimension (2.5D) arrangement and the interposer includes a first standardized slot configured to accept the first chiplet and a second standardized slot configured to accept the second chiplet. 6 . The graphics processing unit of claim 5 , wherein the interposer is an active interposer. 7 . The graphics processing unit of claim 1 , wherein the graphics processing circuitry includes a plurality of functional units having a single instruction multiple thread (SIMT) architecture. 8 . The graphics processing unit of claim 7 , wherein the plurality of functional units includes a general-purpose graphics processing unit core and a tensor core. 9 . The graphics processing unit of claim 7 , wherein the plurality of functional units includes a ray-tracing core. 10 . The graphics processing unit of claim 1 , wherein the graphics processing circuitry includes a plurality of functional units having a single instruction multiple data (SIMD) architecture. 11 . A system comprising: an interconnect to a system interface; and a multi-die graphics processing unit coupled with the interconnect, the multi-die graphics processing unit comprising: an interposer; a first chiplet coupled with the interposer, the first chiplet including a graphics processing circuitry and a switched interconnect network coupled with the graphics processing circuitry; cache circuitry coupled with the graphics processing circuitry via the switched interconnect network; and a second chiplet coupled with the first chiplet via the interposer, the second chiplet including a memory-side cache and a memory controller coupled with the memory-side cache, wherein the memory controller is configured to enable access to a high-bandwidth memory (HBM) device, the memory-side cache is configured to cache data associated with a memory access performed via the memory controller, and the cache circuitry is logically positioned between the graphics processing circuitry and a chiplet interface. 12 . The system of claim 11 , wherein the cache circuitry includes a level-2 (L2) cache. 13 . The system of claim 12 , wherein the L2 cache is a distributed cache having nodes interconnected via the switched interconnect network. 14 . The system of claim 11 , wherein the memory-side cache includes a level-3 (L3) cache. 15 . The system of claim 11 , wherein the interposer, first chiplet, and second chiplet have a 2.5 dimension (2.5D) arrangement and the interposer includes a first standardized slot configured to accept the first chiplet and a second standardized slot configured to accept the second chiplet. 16 . The system of claim 15 , wherein the interposer is an active interposer. 17 . The system of claim 11 , wherein the graphics processing circuitry includes a plurality of functional units having a single instruction multiple thread (SIMT) architecture. 18 . The system of claim 17 , wherein the plurality of functional units includes a general-purpose system core and a tensor core. 19 . The system of claim 17 , wherein the plurality of functional units includes a ray-tracing core. 20 . The system of claim 11 , wherein the graphics processing circuitry includes a plurality of functional units having a single instruction multiple data (SIMD) architecture.
by reordering requests · CPC title
Ray-tracing · CPC title
Loop control instructions; iterative instructions, e.g. LOOP, REPEAT · CPC title
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
LOAD or STORE instructions; Clear instruction · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.