Hybrid directory and snoopy-based coherency to reduce directory update overhead in two-level memory

US11669454B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11669454-B2
Application numberUS-201916405691-A
CountryUS
Kind codeB2
Filing dateMay 7, 2019
Priority dateMay 7, 2019
Publication dateJun 6, 2023
Grant dateJun 6, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processor includes one or more cores having cache, a cache home agent (CHA), a near memory controller, to near memory, and a far memory controller, which is to: receive a first memory read operation from the CHA directed at a memory address; detect a miss for the first memory address at the near memory; issue a second memory read operation to the far memory controller to retrieve a cache line, having first data, from the memory address of far memory; receive the cache line from the far memory controller in response to the second memory read operation; and send the cache line to the CHA with a forced change to a directory state of the cache line at the CHA, the forced change to cause the CHA to snoop remote sockets to maintain data coherence for the cache line in an absence of directory state in the far memory.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: one or more cores, each comprising cache; a cache home agent (CHA) coupled to the cache; and a near memory controller coupled to the CHA, to near memory, and to a far memory controller, wherein the near memory controller is to: receive a first memory read operation from the CHA directed at a memory address; detect a miss for the memory address at the near memory; issue, in response to detection of the miss, a second memory read operation to the far memory controller to retrieve a cache line, comprising first data, from the memory address of far memory; receive the cache line from the far memory controller in response to completion of the second memory read operation; and in response to the cache line being retrieved from the far memory, send the cache line to the CHA with a change to a directory state of the cache line at the CHA, the changed directory state to cause the CHA to snoop remote sockets to maintain data coherence for the cache line, and write the cache line to the near memory with a set of metadata bits indicating that the cache line is clean and directory bits in the cache line are dirty, wherein based on the set of metadata bits, the cache line is not written to the far memory in response to the cache line being selected to be evicted from the near memory, wherein the near memory maintains the data coherence using directory state in an absence of directory state in the far memory. 2. The processor of claim 1 , further comprising the far memory controller to: retrieve the cache line from the far memory in response to the second memory read operation; and send the cache line to the near memory controller. 3. The processor of claim 1 , wherein the changed directory state comprises an “any” (A) state, and wherein the CHA is to snoop the remote sockets for the cache line to maintain data coherence for the cache line. 4. The processor of claim 1 , wherein the near memory controller is further to: update the directory bits in the cache line to a directory state consistent with a read opcode of the first memory read operation. 5. The processor of claim 4 , wherein the cache line written to the near memory further comprises the set of metadata bits comprising: a first bit to indicate whether the first data in the cache line is dirty; and a second bit to indicate whether the directory bits are dirty. 6. The processor of claim 5 , wherein, to evict the cache line, the near memory controller is further to: determine that the first bit of the set of metadata bits indicates that the first data is clean; and not issue a memory write operation to write the first data back to the far memory. 7. The processor of claim 5 , wherein the near memory controller is further to: receive a memory write operation from the CHA directed to the memory address; determine whether the memory write operation is a directory-only write or includes a data write; and in response to being a directory-only write, set the second bit, but not the first bit, of the set of metadata bits, which avoids a write-back to the far memory. 8. The processor of claim 7 , wherein, to determine that the memory write operation is the directory-only write, the near memory controller is to determine that an update to the directory bits is necessary based on a current directory state read from the near memory and on an opcode received in the first memory read operation, and wherein the near memory controller is further to clear the second bit of the set of metadata bits to indicate the first data is clean. 9. The processor of claim 7 , wherein, to determine that the memory write operation is the directory-only write, the near memory controller is to receive a direct indication of the directory-only write in the memory write operation from the CHA, wherein the direct indication is based in part on a previous indication received by the CHA from the near memory controller, in response to the first memory read operation, that the first bit indicated the first data was clean. 10. A method comprising: receiving, by a near memory controller of a processing system, a first memory read operation from a cache home agent (CHA) directed at a memory address; detecting, by the near memory controller, a miss for the memory address at near memory of a two-level memory system; issuing, by the near memory controller in response to detection of the miss, a second memory read operation to a far memory controller to retrieve a cache line, comprising first data, from the memory address of far memory; receiving, by the near memory controller, the cache line from the far memory controller in response to completion of the second memory read operation; and in response to the cache line being retrieved from the far memory, sending, by the near memory controller, the cache line to the CHA with a change to a directory state of the cache line at the CHA, the changed directory state to cause the CHA to snoop remote sockets to maintain data coherence for the cache line, and write the cache line to the near memory with a set of metadata bits indicating that the cache line is clean and directory bits in the cache line are dirty, wherein based on the set of metadata bits, the cache line is not written to the far memory in response to the cache line being selected to be evicted from the near memory, wherein the near memory maintains the data coherence using directory state in an absence of directory state in the far memory. 11. The method of claim 10 , wherein the changed directory state comprises an “any” (A) state, the method further comprising: retrieving, by the far memory controller, the cache line from the far memory in response to the second memory read operation; sending, by the far memory controller, the cache line to the near memory controller; and snooping, by the CHA, the remote sockets for the cache line to maintain data coherence for the cache line. 12. The method of claim 10 , further comprising: updating, by the near memory controller, the directory bits in the cache line to a directory state consistent with a read opcode of the first memory read operation. 13. The method of claim 12 , wherein the cache line written to the near memory further comprises the set of metadata bits comprising: a first bit to indicate whether the first data in the cache line is dirty; and a second bit to indicate whether the directory bits are dirty. 14. The method of claim 13 , further comprising evicting the cache line, wherein evicting comprises: determining, by the near memory controller, that the first bit of the set of metadata bits indicates that the first data is clean; and not issuing, by the near memory controller, a memory write operation to write the first data back to the far memory. 15. The method of claim 13 , further comprising: receiving, by the near memory controller, a memory write operation from the CHA directed to the memory address; determining, by the near memory controller, whether the memory write operation is a directory-only write or includes a data write; and in response to being a directory-only write, setting, by the near memory controller, the second bit, but not the first bit, of the set of metadata bits, which avoids a write-back to the far memory. 16. The method of claim 15 , wherein determining that the memory write operation is the directory-only write comprises determining, by the near memory controller, that an update to the directory bits is necessary based on a current directory state read from the near memory and on an opcode received in the first memory read operation, the met

Assignees

Inventors

Classifications

  • Performance improvement · CPC title

  • Hybrid memory, e.g. using both volatile and non-volatile memory · CPC title

  • Copy directories (local copy tags for implementing a bus snooping protocol G06F12/0831) · CPC title

  • for multiprocessing or multitasking · CPC title

  • with cache invalidating means (G06F12/0815 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11669454B2 cover?
A processor includes one or more cores having cache, a cache home agent (CHA), a near memory controller, to near memory, and a far memory controller, which is to: receive a first memory read operation from the CHA directed at a memory address; detect a miss for the first memory address at the near memory; issue a second memory read operation to the far memory controller to retrieve a cache line…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F12/0822. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 06 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).