Multi-tile graphics processing unit

US12367540B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12367540-B2
Application numberUS-202016951217-A
CountryUS
Kind codeB2
Filing dateNov 18, 2020
Priority dateNov 18, 2020
Publication dateJul 22, 2025
Grant dateJul 22, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus to facilitate processing in a multi-tile device is disclosed. The apparatus comprises a plurality of processing tiles, each including a memory device and a plurality of processing resources, coupled to the device memory, and a memory management unit to manage the memory devices in each of the plurality of tiles to perform allocation of memory resources among the memory devices for execution by the plurality of processing resources.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus to facilitate processing in a multi-tile device, comprising: a plurality of distinct chiplets and a plurality of interconnect structures, respective chiplets of the plurality of distinct chiplets including a processing tile of a plurality of processing tiles, the plurality of distinct chiplets having a 2.5-dimensional (2.5D) arrangement, and respective processing tiles of the plurality of processing tiles include: a memory device; a plurality of processing resources, coupled to the memory device; and a memory management unit to manage the memory device in each of the plurality of processing tiles to perform allocation of memory resources of a workload among respective memory devices of the plurality of processing tiles to facilitate execution of the workload by the plurality of processing resources of the plurality of processing tiles of the plurality of distinct chiplets, wherein the memory management unit is configured to replicate a shared memory resource having a virtual address to respective memory devices of the plurality of processing tiles and enable work items of the workload executed at different processing tiles to access respective copies of the shared memory resource at different physical addresses via the virtual address. 2. The apparatus of claim 1 , wherein the memory management unit includes a page table associated with the respective memory devices of the plurality of processing tiles, each page table to store a different physical address associated with the virtual address of the shared memory resource. 3. The apparatus of claim 2 , wherein the workload is partitioned into a plurality of virtual partitions and the plurality of processing tiles are to respectively retrieve a virtual partition for execution based on a counter value that indicates a virtual partition identifier, the counter value to be incremented after retrieval of a virtual partition by a processing tile. 4. The apparatus of claim 3 , wherein a first processing tile is configured to retrieve a first virtual partition based on a first virtual partition identifier indicated by the counter value and a second processing tile is configured to retrieve a second virtual partition based on a second virtual partition identifier indicated by the counter value. 5. The apparatus of claim 4 , wherein the plurality of virtual partitions have associated dispatch parameters associated with work items to be dispatched on behalf of the virtual partition, the dispatch parameters including a global work size, a local work size and a work group count. 6. The apparatus of claim 1 , wherein the memory management unit is to distribute the memory resources among respective memory devices of the plurality of processing tiles using one of a plurality of memory distributions based on memory access characteristics of the workload. 7. The apparatus of claim 1 , wherein the memory management unit is to distribute the memory resources of the workload via assignment of a contiguous virtual address range to the memory resources across a portion of respective memory devices of the plurality of processing tiles. 8. The apparatus of claim 1 , wherein the plurality of processing resources are synchronized. 9. A method to facilitate processing in a multi-tile device, comprising: receiving a workload to be processed at a graphics processing unit including a plurality of distinct chiplets and a plurality of interconnect structures, respective chiplets of the plurality of distinct chiplets including a processing tile of a plurality of processing tiles, the plurality of distinct chiplets having a 2.5-dimensional (2.5D) arrangement; generating a plurality of virtual partitions to process the workload; retrieving virtual partitions of the plurality of virtual partitions by respective processing tiles of the plurality of processing tiles based on a virtual partition identifier provided by circuitry configured to indicate a virtual partition available for retrieval; and scheduling the plurality of virtual partitions for execution at a plurality of processing resources included in the plurality of processing tiles of the plurality of distinct chiplets. 10. The method of claim 9 , wherein a first virtual partition is executed at a first plurality of resources at a first processing tile and a second virtual partition is executed at a second plurality of resources at a second processing tile. 11. The method of claim 10 , further comprising synchronizing the first virtual partition and the second virtual partition. 12. The method of claim 11 , further comprising generating a command buffer upon receiving the workload. 13. The method of claim 12 , wherein the plurality of virtual partitions comprises a plurality of dispatch parameters. 14. The method of claim 13 , wherein the plurality of dispatch parameters comprise a global work size, a local work size and a work group count. 15. At least one non-transitory computer readable medium having instructions, which when executed by one or more processors, causes the one or more processors to: receive a workload to be processed at a graphics processing unit including a plurality of distinct chiplets and a plurality of interconnect structures, respective chiplets of the plurality of distinct chiplets including a processing tile of a plurality of processing tiles, the plurality of distinct chiplets having a 2.5-dimensional (2.5D) arrangement; generate a plurality of virtual partitions to process the workload; retrieving virtual partitions of the plurality of virtual partitions by respective processing tiles of the plurality of processing tiles based on a virtual partition identifier provided by circuitry configured to indicate a virtual partition available for retrieval; and schedule the plurality of virtual partitions for execution at a plurality of processing resources included in the plurality of processing tiles of the plurality of distinct chiplets. 16. The at least one non-transitory computer readable medium of claim 15 , wherein a first virtual partition is executed at a first plurality of resources at a first processing tile and a second virtual partition is executed at a second plurality of resources at a second processing tile. 17. The at least one non-transitory computer readable medium of claim 16 , having instructions, which when executed by one or more processors, further causes the one or more processors to synchronize the first virtual partition and the second virtual partition. 18. A graphics processing unit (GPU), comprising: a plurality of distinct chiplets and a plurality of interconnect structures, each of the plurality of distinct chiplets including a processing tile of a plurality of processing tiles, the plurality of distinct chiplets having a 2.5-dimensional (2.5D) arrangement, and respective processing tiles of the plurality of processing tiles include: a memory device; a plurality of processing resources, coupled to the memory device; an interface coupled between the plurality of processing tiles; and a memory management unit to manage the memory device in each of the plurality of processing tiles to perform allocation of memory resources among respective memory devices of the plurality of processing tiles to facilitate execution of a workload by the plurality of processing resources of the plurality of processing tiles of the plurality of distinct chiplets, wherein the memory management unit is configured to replicate a shared memory resource having a virtual address to respective memory devices of the plurality of

Assignees

Inventors

Classifications

  • Improving or facilitating administration, e.g. storage management · CPC title

  • Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices · CPC title

  • Single storage device · CPC title

  • Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • Allocation or management of cache space · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12367540B2 cover?
An apparatus to facilitate processing in a multi-tile device is disclosed. The apparatus comprises a plurality of processing tiles, each including a memory device and a plurality of processing resources, coupled to the device memory, and a memory management unit to manage the memory devices in each of the plurality of tiles to perform allocation of memory resources among the memory devices for …
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06T1/60. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 22 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).