Who is the assignee on this patent?

Ecole Polytechnique Fed Lausanne Epfl

What technology area does this patent fall under?

Primary CPC classification G06F15/17331. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Jun 21 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Atomic Object Reads for In-Memory Rack-Scale Computing

US2018173673A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2018173673-A1
Application number	US-201715838514-A
Country	US
Kind code	A1
Filing date	Dec 12, 2017
Priority date	Dec 15, 2016
Publication date	Jun 21, 2018
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A distributed memory system including a plurality of chips, a plurality of nodes that are distributed across the plurality of chips such that each node is comprised within a chip, each node includes a dedicated local memory and a processor core, and each local memory is configured to be accessible over network communication, a network interface for each node, the network interface configured such that a corresponding network interface of each node is integrated in a coherence domain of the chip of the corresponding node, wherein each of the network interfaces are configured to support a one-sided operation, the network interface directly reading or writing in the dedicated local memory of the corresponding node without involving a processor core, and wherein the one-sided operation is configured such that the processor core of a corresponding node uses a protocol to directly inject a remote memory access for read or write request to the network interface of the node, the remote memory access request allowing to read or write an arbitrarily long region of a memory of a remote node,

First claim

Opening claim text (preview).

1 . A distributed memory system comprising: a plurality of chips; a plurality of nodes that are distributed across the plurality of chips such that each node is comprised within a chip, each node includes a dedicated local memory and a processor core, and each local memory is configured to be accessible over network communication; a network interface for each node, the network interface configured such that a corresponding network interface of each node is integrated in a coherence domain of the chip of the corresponding node; wherein each of the network interfaces are configured to support a one-sided operation, the network interface directly reading or writing in the dedicated local memory of the corresponding node without involving a processor core, wherein the one-sided operation is configured such that the processor core of a corresponding node uses a protocol to directly inject a remote memory access for read or write request to the network interface of the node, the remote memory access request allowing to read or write an arbitrarily long region of a memory of a remote node, wherein a network interface of a requesting node further includes a parser configured to parse the request and send the request to a network interface of a target remote servicing node, wherein the network interface of the target remote servicing node is configured to directly operate on an associated local memory according to the received request, without an involvement of a processor core of the target remote servicing node, and to reply to the requesting network interface with the requested data, if the request was a read, or a write acknowledgement, if the request was a write, and wherein a plurality of regions of the dedicated local memory of each node are organized by software as a set of data objects, each of the data objects having a standardized layout, including a header having a lock or a version, followed by a data of a data object, the network interface relying on a standardized data object memory layout and the integration of the network interface in a local coherence domain of the network interface, to identify a potential atomicity violation by snooping on coherence messages, thus enabling the network interface to perform one-sided atomic data object read operations. 2 . A network interface in a distributed memory system according to claim 1 , comprising: a plurality of object buffers, wherein each object buffer includes an object address field, followed by a plurality of object buffer entries, and wherein each object buffer entry is two bits intended to encode any one of four possible states from the following list: unused, used, pending, done, a default state being unused. 3 . A method for implementing a lightweight mechanism configured to extend a network interface to provide atomic reads of arbitrarily long data objects, each object comprising a header followed by data of the object stored in object buffers, the header including a lock or a version, the method comprising: integrating the network interface in a local coherence domain of the network interface; snooping on the network interface on coherence messages, to identify potential atomicity violations while a data object is being read; assigning by the network interface one of the available object buffers to the remote object read request, whenever the network interface receives a remote object read request from the network the network interface stores a base address of the object in an object address field of the object buffer, and the first N entries of the object buffer are marked as used, where N is the total number of cache blocks that the object requested by the remote object read request is comprised of, wherein an object buffer entry (i) corresponds to the cache block (i) of the requested object; speculatively sending by the network interface read requests for the cache blocks of the object and assessing a state of the lock or a state of the version, marking corresponding object buffer entries of the object as pending, consequently marking them as done as the data replies from memory arrive and sending the data replies back to the original requester through the network, wherein all data cache block reads completed by the network interface are speculative until the header of the object is retrieved and assessed, wherein when the state of the lock or the state of the version indicates a free object, a cache block read that has been speculatively completed prior to assessing the header of the object qualifies as valid; else, the speculative cache blocks reads fail, and the network interface performs a failure sequence, which involves either one of reading the object again, or sending a failure notification to a network interface that originally sent the remote object read request; in case the network interface receives a coherence invalidation message for an address that belongs in an address range of the data object, the network interface checks whether the invalidation matches a first entry of the object buffer, which corresponds to a header of the data object, and if this is the case, the object read request fails and the network interface performs a failure sequence for that object read request; checks whether the invalidation matches an entry of the object buffer that is not unused and is not the first, and if that object buffer's first entry is in the done state, the invalidation is ignored; otherwise, the reception of an invalidation for an entry of the object buffer that is in the done state results in the network interface performing a failure sequence for the object read request corresponding to that object buffer; in any other case, the reception of an invalidation is ignored; wherein a one-sided atomic data object read request successfully completes when all the cache blocks comprising the requested object have been read from the memory, and wherein the network interface frees the object buffer used for the request by resetting all of the entries of the object buffer to unused. 4 . The method of claim 3 , further comprising checking a number of entries in an object buffer, if the object buffer features fewer entries than a total of cache lines of the requested data object, binding the number of speculative cache block reads that the network interface can issue for that data object, by the available entries; if the access to the header of the data object has not completed, the network interface stalls the processing of the atomic read request of the data object until the first access completes, thereby only having a negative impact on performance, but not introducing a functionality limitation, such that the maximum size of the data object that the network interface can read atomically is not limited by the number of entries of the object buffers of the network interface.

Assignees

Ecole Polytechnique Fed Lausanne Epfl

Inventors

Classifications

G06F2212/507
using speculative control · CPC title
G06F12/0815
Cache consistency protocols · CPC title
G06F2212/1041
Resource optimization · CPC title
G06F9/544
Buffers; Shared memory; Pipes · CPC title
G06F9/528
by using speculative mechanisms · CPC title

Patent family

Related publications grouped by family.

View patent family 57629305

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018173673A1 cover?: A distributed memory system including a plurality of chips, a plurality of nodes that are distributed across the plurality of chips such that each node is comprised within a chip, each node includes a dedicated local memory and a processor core, and each local memory is configured to be accessible over network communication, a network interface for each node, the network interface configured su…
Who is the assignee on this patent?: Ecole Polytechnique Fed Lausanne Epfl
What technology area does this patent fall under?: Primary CPC classification G06F15/17331. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Jun 21 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).