Scalable memory interface for graphical processor unit

US10552937B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10552937-B2
Application numberUS-201815867688-A
CountryUS
Kind codeB2
Filing dateJan 10, 2018
Priority dateJan 10, 2018
Publication dateFeb 4, 2020
Grant dateFeb 4, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments are generally directed to a scalable memory interface for a graphical processor unit. An embodiment of an apparatus includes a graphical processing unit (GPU) including multiple autonomous engines; a common memory interface for the autonomous engines; and a memory management unit for the common memory interface, the memory management unit including multiple engine modules, wherein each of the engine modules includes a translation-lookaside buffer (TLB) that is dedicated to providing address translation for memory requests for a respective autonomous engine of the plurality of autonomous engines, and a TLB miss tracking mechanism that provides tracking for the respective autonomous engine.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a graphical processing unit (GPU) including: a plurality of autonomous engines; a common memory interface for the plurality of autonomous engines; and a memory management unit (MMU) for the common memory interface, the MMU including a plurality of engine modules, wherein each of the plurality of engine modules includes: a translation-lookaside buffer (TLB) that is dedicated to providing address translation for memory requests for a respective autonomous engine of the plurality of autonomous engines, and a TLB miss tracking mechanism that provides tracking for the respective autonomous engine; wherein the MMU includes a plurality of MMU sub-units coupled by one or more communication links, each of the MMU sub-units including one or more engine modules to support one or more autonomous engines. 2. The apparatus of claim 1 , wherein the MMU is to route each received memory request to the respective engine module for an autonomous engine based on an engine identification (ID) for the memory request. 3. The apparatus of claim 1 , wherein the MMU further includes shared resources to support the plurality of autonomous engines. 4. The apparatus of claim 3 , wherein the shared resources include a second level TLB (STLB) to provide address translations when a miss occurs in the TLB of any of the engine modules. 5. The apparatus of claim 4 , wherein the shared resources further include a page walker to walk a page table when a miss occurs in the STLB. 6. The apparatus of claim 1 , wherein a first MMU sub-unit of the plurality of MMU sub-units is a master MMU sub-unit to provide a single point of contact for the MMU. 7. The apparatus of claim 1 , wherein a memory request received by the MMU is directed to one of the plurality of MMU sub-units via the one or more communication links between the MMU sub-units. 8. The apparatus of claim 1 , wherein the MMU includes two or more copies of a same MMU sub-unit. 9. A non-transitory computer-readable storage medium having stored thereon data representing sequences of instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: receiving a memory request for a first autonomous engine of a plurality of autonomous engines supported by a graphics processing unit (GPU) at a memory management unit (MMU) for a GPU shared memory interface, a memory address for the memory request including a virtual address (VA), wherein the MMU includes a plurality of MMU sub-units coupled by one or more link elements, each of the MMU sub-units including one or more engine modules to support one or more autonomous engines; directing the memory request to a first engine module of a plurality of engine modules, the first engine module being designated to provide translation-lookaside buffer (TLB) services and miss tracking for the first autonomous engine, wherein directing the memory request to a first engine module includes directing the memory request to an MMU unit of the plurality of MMU units that includes the first engine module; determining whether the VA is a hit or miss in a TLB of the first engine module; and upon determining that the VA is a hit in the TLB of the first engine module, providing physical address (PA) translation for the virtual address and directing the memory request to system memory. 10. The medium of claim 9 , further comprising instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: upon determining that the VA is a miss in the TLB of the first engine module, directing the memory request to a shared TLB, the shared TLB to support the plurality of engine modules; determining whether the VA is a hit or miss in the shared TLB; and upon determining that the VA is a hit in the shared TLB, providing PA translation for the virtual address, returning the PA to the engine module, and directing the memory request to system memory. 11. The medium of claim 10 , further comprising instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: upon determining that the VA is a miss in the shared TLB, directing the memory request to a page walker element to walk a page table to determine the PA for the VA. 12. A processing system comprising: a central processing unit (CPU); a system memory; and a graphical processing unit (GPU) including: a plurality of autonomous engines; a common memory interface to the system memory for the plurality of autonomous engines; and a memory management unit (MMU) for the common memory interface, the MMU including a plurality of engine modules, wherein each of the plurality of engine modules includes: a translation-lookaside buffer (TLB) that is dedicated to providing address translation for memory requests for a respective autonomous engine of the plurality of autonomous engines, and a TLB miss tracking mechanism that provides tracking for the respective autonomous engine; wherein the MMU includes a plurality of MMU sub-units coupled by one or more communication links, each of the MMU sub-units including one or more engine modules to support one or more autonomous engines. 13. The system of claim 12 , wherein the MMU is to route each received memory request to the respective engine module for an autonomous engine based on an engine identification (ID) for the memory request. 14. The system of claim 12 , wherein the MMU includes a single contact point for communication. 15. The system of claim 12 , wherein the MMU further includes shared resources to support the plurality of autonomous engines. 16. The system of claim 15 , wherein the shared resources include a second level TLB (STLB) to provide address translations when a miss occurs in the TLB of any of the engine modules. 17. The system of claim 16 , wherein the shared resources further include a page walker to walk a page table when a miss occurs in the STLB. 18. The system of claim 12 , wherein a first MMU sub-unit of the plurality of MMU sub-units is a master MMU sub-unit to provide a single point of contact for the MMU. 19. The system of claim 12 , wherein a memory request received by the MMU is directed to one of the plurality of MMU sub-units via the one or more communication links between the MMU sub-units. 20. The system of claim 12 , wherein the MMU includes two or more copies of a same MMU sub-unit.

Assignees

Inventors

Classifications

  • TLB miss handling · CPC title

  • using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] · CPC title

  • Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • Details of translation look-aside buffer [TLB] · CPC title

  • G06T1/60Primary

    Memory management · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10552937B2 cover?
Embodiments are generally directed to a scalable memory interface for a graphical processor unit. An embodiment of an apparatus includes a graphical processing unit (GPU) including multiple autonomous engines; a common memory interface for the autonomous engines; and a memory management unit for the common memory interface, the memory management unit including multiple engine modules, wherein e…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06T1/60. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 04 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).