Data processing apparatus and method for performing load-exclusive and store-exclusive operations
US-9223701-B2 · Dec 29, 2015 · US
US2019042425A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2019042425-A1 |
| Application number | US-201815948569-A |
| Country | US |
| Kind code | A1 |
| Filing date | Apr 9, 2018 |
| Priority date | Apr 9, 2018 |
| Publication date | Feb 7, 2019 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for managing multi-level memory and coherency using a unified page granular controller can simplify software programming of both file system handling for persistent memory and parallel programming of host and accelerator and enable better software utilization of host processors and accelerators. As part of the management techniques, a line granular controller cooperates with a page granular controller to support both fine grain and coarse grain coherency and maintain overall system inclusion property. In one example, a controller to manage coherency in a system includes a memory data structure and on-die tag cache to store state information to indicate locations of pages in a memory hierarchy and an ownership state for the pages, the ownership state indicating whether the pages are owned by a host processor, owned by an accelerator device, or shared by the host processor and the accelerator device. The controller can also include logic to, in response to a memory access request from the host processor or the accelerator to access a cacheline in a page in a state indicating ownership by a device other than the requesting device, cause the page to transition to a state in which the requesting device owns or shares the page.
Opening claim text (preview).
What is claimed is: 1 . An apparatus to manage coherency in a system, the apparatus comprising: hardware storage to store information to indicate locations of pages in a memory hierarchy and an ownership state for the pages, the ownership state indicating whether the pages are owned by a host processor, owned by an accelerator device, or shared by the host processor and the accelerator device; and logic to: in response to a memory access request from the host processor or the accelerator to access a cacheline in a page in a state indicating ownership by a device other than the requesting device, cause the page to transition to a state in which the requesting device owns or shares the page. 2 . The apparatus of claim 1 , wherein the logic is to allow an access from a device to a page when the page is in a state indicating the device owns or shares the page. 3 . The apparatus of claim 1 , wherein: the hardware storage comprises on-die or on-package storage to store the information to indicate locations of pages in the memory hierarchy and the ownership state for the pages. 4 . The apparatus of claim 1 , wherein the logic is to: in response to a memory request from the host processor or the accelerator to access a cacheline in a page that resides in far memory or remote memory, cause the page to be allocated to a near memory cache. 5 . The apparatus of claim 4 , wherein the logic is to: in response to an access that hits a full set in the near memory cache, de-allocate a least recently used victim page from the near memory cache and write modified data of the victim page to the far memory or the remote memory. 6 . The apparatus of claim 1 , wherein: the memory hierarchy includes a near memory cache and a far memory; wherein the information indicates locations in the near memory cache for far memory pages; and wherein the far memory is to store pages owned by the host processor, pages owned by the accelerator device, and shared memory pages. 7 . The apparatus of claim 1 , wherein: the memory hierarchy includes byte-addressable persistent memory; and wherein the persistent memory is to store pages owned by the host processor, pages owned by the accelerator device, and shared memory pages. 8 . The apparatus of claim 3 , wherein: the near memory cache comprises volatile memory; and wherein the far memory comprises non-volatile byte addressable storage. 9 . The apparatus of claim 6 , wherein: the logic is to cause the information to be stored to the hardware storage, to a structure in near memory, and to a structure in the far memory; wherein the hardware storage is to store the information for recently accessed pages, the structure in the near memory is to store information for pages allocated to the near memory cache, and the structure in the far memory is to store information for all memory pages; and wherein the information to be stored in the hardware storage, the structure in the near memory, and the structure in the far memory is to indicate locations in the near memory cache for far memory pages and the ownership state. 10 . The apparatus of claim 6 , wherein: the memory hierarchy includes a memory coupled with the accelerator device; wherein the state information indicates locations for pages stored in the memory coupled with the accelerator device; and wherein the memory coupled with the accelerator device is to store pages owned by the host processor, pages owned by the accelerator device, and shared memory pages. 11 . The apparatus of claim 1 , wherein: the state information is to further indicate whether copies of cachelines of the page is to be in one or more of: a host processor-side cache, a near memory cache, a filtered portion of an accelerator-side cache that is tracked in a host processor-side snoop filter, and a non-filtered portion of an accelerator-side cache that is not tracked in the host processor-side snoop filter. 12 . The apparatus of claim 1 , wherein: the hardware storage is to store one or more bits to indicate whether the page is mapped to a domain or shared by multiple domains; and wherein domains include: a first domain to indicate a page is owned by the host processor and a second domain to indicate a page is owned by the accelerator device. 13 . The apparatus of claim 12 , wherein the system includes multiple accelerator devices, and wherein the domains include domains for groups of accelerator devices or a single domain for the multiple accelerator devices. 14 . The apparatus of claim 12 , wherein the system includes multiple host processors, and wherein the domains include domains for groups of host processors or a single domain for the multiple host processors. 15 . The apparatus of claim 12 , wherein the logic to cause a page to transition to another state is to: update the state information for the page in the hardware storage; and cause a cache flush of any cachelines in the page having copies in a cache that is not mapped to the domain being transitioned to. 16 . The apparatus of claim 15 , wherein the logic to cause a page to transition to another state is to: update the information to indicate location and ownership state in a structure stored in memory. 17 . The apparatus of claim 1 , wherein the logic is to: receive a snoop filter miss to access a cacheline in a page; and in response to receipt of the snoop filter miss, determine a state of the page based on the stored state information. 18 . The apparatus of claim 15 , wherein the logic is to: in response to transition of the page to a state indicating ownership by the host processor or a shared state, cause one or more cachelines in the page to be allocated in a host processor-side snoop filter; and in response to transition of the page to a state indicating ownership by the accelerator device, cause cachelines in the page to not be allocated in the host processor-side snoop filter. 19 . The apparatus of claim 1 , wherein the logic is to: in response to detection of concurrent memory access requests from both the host processor and the accelerator to access cachelines in a same page, cause the page to transition to a state in which the host processor and the accelerator share the page. 20 . The apparatus of claim 19 , wherein the logic is to: in response to the detection of concurrent memory access requests to access cachelines in the same page, store information indicating a conflict for the page. 21 . The apparatus of claim 20 , wherein the logic is to: store the information indicating the conflict for the page comprises allocating the page in a translation lookaside buffer (TLB) or FIFO (first in first out) of recent page conflicts; and in response to eviction of the page from the TLB or FIFO, cause a transition back to the page's pre-conflict state or other pre-defined conflict exit state. 22 . The apparatus of claim 21 , wherein the logic is to: in response to a determination that the page is in the TLB or FIFO, determine the page is in a shared state; and in response to determination that the page is not in the TLB or FIFO, determine the state of the page based on the stored state information for the page. 23 . The apparatus of claim 21 , wherein the logic is to: de-allocate a page from the TLB or FIFO in response to detection of one or more conditions including: detection that the page is evicted from a near memory cache, and for a TLB, a determination
with multilevel cache hierarchies · CPC title
using a bus scheme, e.g. with bus monitoring or watching means · CPC title
using page tables, e.g. page table structures · CPC title
with age lists, e.g. queue, most recently used [MRU] list or least recently used [LRU] list · CPC title
using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.