Graphics processor operation scheduling for deterministic latency

US12079155B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12079155-B2
Application numberUS-202017428216-A
CountryUS
Kind codeB2
Filing dateMar 14, 2020
Priority dateMar 15, 2019
Publication dateSep 3, 2024
Grant dateSep 3, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments described herein include software, firmware, and hardware that provides techniques to enable deterministic scheduling across multiple general-purpose graphics processing units. One embodiment provides a multi-GPU architecture with uniform latency. One embodiment provides techniques to distribute memory output based on memory chip thermals. One embodiment provides techniques to enable thermally aware workload scheduling. One embodiment provides techniques to enable end to end contracts for workload scheduling on multiple GPUs.

First claim

Opening claim text (preview).

What is claimed is: 1. A general-purpose graphics processor comprising: a memory access pipeline configured to access a memory system having physically interleaved memory addressing, the memory system including: a first memory device that is local to the general-purpose graphics processor; and a second memory device that is remote to the general-purpose graphics processor and local to a first remote general-purpose graphics processor, wherein the memory access pipeline includes hardware to facilitate access to physically interleaved memory pages of the memory system, the physically interleaved memory pages include a first physical memory page and a second physical memory page, the first physical memory page to be stored on the first memory device and the second physical memory page to be stored on the second memory device, and wherein the first physical memory page is to be contiguous with the second physical memory page and the hardware of the memory access pipeline is to satisfy memory access requests for contiguous physical memory pages from the first memory device and the second memory device to cause memory access latency for the memory access requests to be an average of the memory access latency of the first memory device and the second memory device. 2. The general-purpose graphics processor as in claim 1 , wherein the memory system includes a third memory device that is local to a second remote general-purpose graphics processor and remote to the general-purpose graphics processor and the first remote general-purpose graphics processor, the third memory device to include a third physical memory page configured to be contiguous with the first physical memory page and the second physical memory page. 3. The general-purpose graphics processor as in claim 2 , wherein the memory access pipeline includes hardware to: determine that physical memory addresses are to be interleaved across multiple memory devices including the first memory device, the second memory device, and the third memory device; map multiple contiguous physical memory pages to the multiple memory devices; and satisfy memory access requests for the multiple contiguous physical memory pages from the multiple memory devices to average memory access latency for the access requests. 4. The general-purpose graphics processor as in claim 3 , wherein the hardware of the memory access pipeline includes a memory management and mapping unit to map the multiple contiguous physical memory pages to multiple memory devices. 5. The general-purpose graphics processor as in claim 4 , wherein hardware of the memory management and mapping unit is included in one or more of a memory controller, memory management unit, or address generation unit. 6. The general-purpose graphics processor as in claim 1 , additionally comprising a point-to-point interconnect to the first remote general-purpose graphics processor. 7. The general-purpose graphics processor as in claim 6 , wherein the access to the second physical memory page stored on the second memory device is to traverse the point-to-point interconnect. 8. The general-purpose graphics processor as in claim 7 , wherein access latency to the first memory device via the memory access pipeline is lower than the access latency to the second memory device. 9. A method comprising: on graphics processing system having multiple general-purpose graphics processing units (GPGPUs): initializing a memory management system for two or more of the multiple GPGPUs; determining that physical memory addresses for the two or more of the multiple GPGPUs are to be interleaved across multiple memory devices; mapping physical memory pages for the physical memory addresses across the multiple memory devices; and satisfying an access request for multiple contiguous physical memory pages from the multiple memory devices to cause memory access latency for the access request to be an average of the memory access latencies of the multiple memory devices. 10. The method as in claim 9 , wherein determining that physical memory addresses for the two or more of the multiple GPGPUs are to be interleaved across multiple memory devices includes reading a set of configuration registers associated with the multiple memory devices. 11. The method as in claim 9 , wherein determining that physical memory addresses for the two or more of the multiple GPGPUs are to be interleaved across multiple memory devices includes determining that physical memory addresses for a first GPGPU are to be interleaved across a first memory device coupled with the first GPGPU and a second memory device coupled with a second GPGPU. 12. The method as in claim 11 , wherein mapping physical memory pages for the physical memory addresses across the multiple memory devices includes mapping a first memory page for the first GPGPU to the first memory device and mapping a second contiguous physical memory page for the first GPGPU to the second memory device. 13. A graphics processing system comprising: a first memory device; and a first general-purpose graphics processor coupled with the first memory device, the first general-purpose graphics processor comprising a memory access pipeline configured to access the first memory device and an interconnect to couple with a second general-purpose graphics processor, the second general-purpose graphics processor coupled with a second memory device, wherein: the memory access pipeline of the first general-purpose graphics processor includes hardware to facilitate access to a first physical memory page and a second physical memory page, the first physical memory page is to be stored on the first memory device, the second physical memory page is to be stored on the second memory device, the first physical memory page is to be contiguous with the second physical memory page and; the hardware of the memory access pipeline is to satisfy memory access requests for contiguous physical memory pages from the first memory device and the second memory device to cause memory access latency for the memory access requests to be an average of the memory access latency of the first memory device and the second memory device. 14. The graphics processing system as in claim 13 , comprising a third memory device coupled with a third general-purpose graphics processor, the third memory device to include a third physical memory page to be contiguous with the first physical memory page and the second physical memory page. 15. The graphics processing system as in claim 14 , wherein the memory access pipeline includes hardware to: determine that physical memory addresses are to be interleaved across multiple memory devices including the first memory device the second memory device, and the third memory device; map multiple contiguous physical memory pages to the multiple memory devices; and satisfy memory access requests for the multiple contiguous physical memory pages from the multiple memory devices to average memory access latency for the access requests. 16. The graphics processing system as in claim 15 , wherein the hardware of the memory access pipeline includes a memory management and mapping unit to map the multiple contiguous physical memory pages to multiple memory devices. 17. The graphics processing system as in claim 16 , wherein hardware of the memory management and mapping unit is included in one or more of a memory controller, a memory management unit, or an address generation unit. 18. The graphics processing system as in claim 13 , wherein the interconnect is a point-t

Assignees

Inventors

Classifications

  • Page size control · CPC title

  • Details relating to cache mapping · CPC title

  • Prefetching based on hints or prefetch instructions · CPC title

  • Prefetching based on access pattern detection, e.g. stride based prefetch · CPC title

  • Reconfiguration of cache memory · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12079155B2 cover?
Embodiments described herein include software, firmware, and hardware that provides techniques to enable deterministic scheduling across multiple general-purpose graphics processing units. One embodiment provides a multi-GPU architecture with uniform latency. One embodiment provides techniques to distribute memory output based on memory chip thermals. One embodiment provides techniques to enabl…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F12/0862. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 03 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).