Prioritizing local and remote memory access in a non-uniform memory access architecture

US10838864B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10838864-B2
Application numberUS-201815992885-A
CountryUS
Kind codeB2
Filing dateMay 30, 2018
Priority dateMay 30, 2018
Publication dateNov 17, 2020
Grant dateNov 17, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A miss in a cache by a thread in a wavefront is detected. The wavefront includes a plurality of threads that are executing a memory access request concurrently on a corresponding plurality of processor cores. A priority is assigned to the thread based on whether the memory access request is addressed to a local memory or a remote memory. The memory access request for the thread is performed based on the priority. In some cases, the cache is selectively bypassed depending on whether the memory access request is addressed to the local or remote memory. A cache block is requested in response to the miss. The cache block is biased towards a least recently used position in response to requesting the cache block from the local memory and towards a most recently used position in response to requesting the cache block from the remote memory.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: detecting a miss in a cache by a thread in a wavefront comprising a plurality of threads that are each executing a corresponding memory access request concurrently on a corresponding plurality of processor cores; assigning a priority to the thread based on whether the memory access request is addressed to a local memory or a remote memory; and performing the memory access request for the thread based on the priority. 2. The method of claim 1 , further comprising: determining whether the memory access request is addressed to the local memory or the remote memory based on an address interleaving scheme for a virtual-to-physical address mapping or information stored in an entry of a translation lookaside buffer. 3. The method of claim 1 , wherein assigning the priority to the thread further comprises assigning the priority to the thread based on a first number of the plurality of threads that miss in the cache and a second number of the plurality of threads that hit in the cache. 4. The method of claim 3 , wherein assigning the priority to the thread further comprises: assigning a first priority to the thread in response to the miss being addressed to the remote memory and the second number being above a first threshold; assigning a second priority, lower than the first priority, to the thread in response to the miss being addressed to the remote memory and the first number being above a second threshold; assigning a third priority, lower than the second priority, to the thread in response to the miss being addressed to the remote memory and a fraction of other threads in the wavefront that also miss in the cache and are addressed to the remote memory being above a third threshold; assigning a fourth priority, lower than the third priority, to the thread in response to the miss being addressed to the local memory and the second number being above a fourth threshold; assigning a fifth priority, lower than the fourth priority, to the thread in response to the miss being addressed to the local memory and a fraction of other threads in the wavefront that also miss in the cache and are addressed to the local memory being above a fifth threshold; and assigning a sixth priority, lower than the fifth priority, to the thread in response to the miss being addressed to the local memory and a fraction of other threads in the wavefront that also miss in the cache and are addressed to the remote memory being above a sixth threshold. 5. The method of claim 1 , further comprising: allocating the thread to one of a plurality of queues maintained in a local memory controller for the local memory and a remote memory controller for the remote memory based on the priority. 6. The method of claim 5 , wherein the plurality of queues is associated with a corresponding plurality of priorities, and wherein performing the memory access request comprises servicing the queues based on the plurality of priorities. 7. The method of claim 1 , further comprising: bypassing the cache in response to the memory access request being addressed to the local memory; and accessing the cache in response to the memory access request being addressed to the remote memory. 8. The method of claim 1 , further comprising: requesting a cache block in response to the miss; biasing the cache block towards a least recently used (LRU) position in the cache in response to requesting the cache block from the local memory and in response to the requested cache block being inserted into the cache; and biasing the cache block towards a most recently used (MitU) position in the cache in response to requesting the cache block from the remote memory and in response to the requested cache block being inserted into the cache. 9. The method of claim 1 , further comprising: sending memory access requests to the cache in an order that is determined based on whether the memory access request is addressed to the local memory or the remote memory. 10. An apparatus comprising: a plurality of processor cores configured to execute a wavefront including a plurality of threads that perform a memory access request; and a cache to store information for at least one of the plurality of processor cores, wherein a priority is assigned to a thread in response to the memory access request performed by the thread missing in the cache, wherein the priority is determined based on whether the memory access request is addressed to a local memory or a remote memory, and wherein a corresponding one of the plurality of processor cores performs the memory access request for the thread based on the priority. 11. The apparatus of claim 10 , further comprising: a translation lookaside buffer configured to store an entry that indicates a virtual-to-physical address mapping for the memory access request, wherein an address interleaving scheme for the virtual-to-physical address mapping or information stored in the entry of the translation lookaside buffer indicates whether the memory access request is addressed to the local memory or the remote memory. 12. The apparatus of claim 10 , wherein the priority is assigned to the thread based on a first number of the plurality of threads that miss in the cache and a second number of the plurality of threads that hit in the cache. 13. The apparatus of claim 12 , wherein: a first priority is assigned to the thread in response to the miss being addressed to the remote memory and the second number being above a first threshold; a second priority, lower than the first priority, is assigned to the thread in response to the miss being addressed to the remote memory and the first number being above a second threshold; a third priority, lower than the second priority, is assigned to the thread in response to the miss being addressed to the remote memory and a fraction of other threads in the wavefront that also miss in the cache and are addressed to the remote memory being above a third threshold; a fourth priority, lower than the third priority, is assigned to the thread in response to the miss being addressed to the local memory and the second number being above a fourth threshold; a fifth priority, lower than the fourth priority, is assigned to the thread in response to the miss being addressed to the local memory and a fraction of other threads in the wavefront that also miss in the cache and are addressed to the local memory being above a fifth threshold; and a sixth priority, lower than the fifth priority, is assigned to the thread in response to the miss being addressed to the local memory and a fraction of other threads in the wavefront that also miss in the cache and are addressed to the remote memory being above a sixth threshold. 14. The apparatus of claim 10 , further comprising: a local memory controller for the local memory, wherein the local memory controller maintains a plurality of first queues; and a remote memory controller for the remote memory, wherein the remote memory controller maintains a plurality of second queues, and wherein the thread is allocated to one of the plurality of first queues or the plurality of second queues based on the priority. 15. The apparatus of claim 14 , wherein the plurality of first queues and the plurality of second queues are associated with a corresponding plurality of priorities, and wherein the plurality of first queues and the plurality of second queues are serviced based on the plurality of priorities. 16. The apparatus of claim 10 , wherein: the cache is bypassed in response to the memory access request being ad

Assignees

Inventors

Classifications

  • with priority control · CPC title

  • Virtual address space management · CPC title

  • using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] · CPC title

  • Correctness of operation, e.g. memory ordering · CPC title

  • based on priority control (G06F13/1605 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10838864B2 cover?
A miss in a cache by a thread in a wavefront is detected. The wavefront includes a plurality of threads that are executing a memory access request concurrently on a corresponding plurality of processor cores. A priority is assigned to the thread based on whether the memory access request is addressed to a local memory or a remote memory. The memory access request for the thread is performed bas…
Who is the assignee on this patent?
Advanced Micro Devices Inc
What technology area does this patent fall under?
Primary CPC classification G06F12/1027. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 17 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).