Method and apparatus to use DRAM as a cache for slow byte-addressible memory for efficient cloud applications
US-12174739-B2 · Dec 24, 2024 · US
US9830276B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9830276-B2 |
| Application number | US-201715437400-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 20, 2017 |
| Priority date | Mar 15, 2013 |
| Publication date | Nov 28, 2017 |
| Grant date | Nov 28, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One embodiment of the present invention is a parallel processing unit (PPU) that includes one or more streaming multiprocessors (SMs) and implements a replay unit per SM. Upon detecting a page fault associated with a memory transaction issued by a particular SM, the corresponding replay unit causes the SM, but not any unaffected SMs, to cease issuing new memory transactions. The replay unit then stores the faulting memory transaction and any faulting in-flight memory transaction in a replay buffer. As page faults are resolved, the replay unit replays the memory transactions in the replay buffer—removing successful memory transactions from the replay buffer—until all of the stored memory transactions have successfully executed. Advantageously, the overall performance of the PPU is improved compared to conventional PPUs that, upon detecting a page fault, stop performing memory transactions across all SMs included in the PPU until the fault is resolved.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method, comprising: receiving a first virtual memory transaction from a first processor; attempting to execute the first virtual memory transaction; detecting a first page fault related to the first virtual memory transaction; and causing a stall condition that inhibits the first processor from generating subsequent virtual memory transactions until the first page fault has been resolved. 2. The method of claim 1 , further comprising re-executing the first virtual memory transaction once the stall condition has been resolved. 3. The method of claim 2 , further comprising invalidating a translation lookaside buffer prior to re-executing the first virtual memory transaction. 4. The method of claim 2 , wherein re-executing the first virtual memory transaction comprises: determining whether a translation lookaside buffer includes an entry corresponding to the first virtual memory transaction; and if the translation lookaside buffer includes the entry, then completing a virtual memory translation for the first virtual memory transaction, or if the translation lookaside buffer does not include the entry, then storing the first virtual memory transaction in a replay buffer. 5. The method of claim 2 , wherein the first virtual memory transaction is re-executed along with at least one other virtual memory transaction stored in the replay buffer. 6. The method of claim 1 , further comprising determining that the replay buffer is empty and enabling the first processor to generate subsequent virtual memory transactions. 7. The method of claim 1 , further comprising receiving a second virtual memory transaction from a second processor while the first page fault remains unresolved, and successfully executing the second virtual memory transaction. 8. The method of claim 1 , further comprising: receiving a second virtual memory transaction from the first processor prior to detecting the first page fault; detecting a second page fault related to the second virtual memory transaction; and storing the second virtual memory transaction in the replay buffer. 9. The method of claim 1 , wherein resolving the first page fault comprises: locating a memory page related to the first virtual memory transaction within a first memory based on a global translation table; and adding a virtual mapping for the memory page to a translation lookaside buffer. 10. The method of claim 9 , wherein resolving the first page fault further comprises copying the memory page from the first memory to a second memory. 11. The method of claim 10 , wherein the first memory comprises a system memory coupled to a central processing unit, and the second memory comprises a memory coupled to a multithreaded processing unit. 12. A non-transitory computer-readable storage medium including instructions that, when executed by a multithreaded processing unit, cause the multithreaded processing unit to perform the steps of: receiving a first virtual memory transaction from a first processor; attempting to execute the first virtual memory transaction; detecting a first page fault related to the first virtual memory transaction; and causing a stall condition that inhibits the first processor from generating subsequent virtual memory transactions until the first page fault has been resolved. 13. A system, comprising: a memory; and a multithreaded processing unit coupled to the memory and configured to: receive a first virtual memory transaction from a first processor; attempt to execute the first virtual memory transaction; detect a first page fault related to the first virtual memory transaction; and cause a stall condition that inhibits the first processor from generating subsequent virtual memory transactions until the first page fault has been resolved. 14. The system of claim 13 , wherein the multithreaded processor is further configured to re-execute the first virtual memory transaction once the stall condition has been resolved.
Transactional memory (G06F9/528 takes precedence) · CPC title
In special purpose processing node, e.g. vector processor · CPC title
in hierarchically structured memory systems, e.g. virtual memory systems · CPC title
TLB miss handling · CPC title
using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.