Reliable replication mechanisms based on active-passive HFI protocols built on top of non-reliable multicast fabric implementations

US10320710B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10320710-B2
Application numberUS-201514865162-A
CountryUS
Kind codeB2
Filing dateSep 25, 2015
Priority dateSep 25, 2015
Publication dateJun 11, 2019
Grant dateJun 11, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, apparatus, and systems for reliable replication mechanisms based on active-passive HFI protocols build on top of non-reliable multicast fabric implementations. Under a first hardware-based scheme, a reliable replication mechanism is (primarily) implemented via Host Fabric Interfaces (HFIs) coupled to (or integrated in) nodes coupled to a non-reliable fabric. Under this approach, the HFIs take an active role in ensuring reliable delivery of multicast messages to each of multiple target nodes. Under a second hybrid software/hardware scheme, software running on nodes is responsible for determining whether target nodes have confirmed delivery of multicast messages and sending retry messages for cases in which delivery is not acknowledged within a timeout period. At the same time, the HFIs on the target nodes are responsible for generating reply messages containing acknowledgements rather than software running on the target nodes.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for reliably delivering a multicast message from an originator node to a plurality of target nodes over a non-reliable fabric to which each of the originator node and the plurality of target nodes is coupled, comprising: sending a multicast message from a Host Fabric Interface (HFI) of the originator node to a switch in the non-reliable fabric, wherein the multicast message is configured to cause the switch to generate and send a respective unicast message corresponding to the multicast message to each of the plurality of target nodes; receiving, at the HFI for the originator node, one or more reply messages from one or more of the plurality of target nodes, the one or more reply messages indicating that the target node sending the reply message has successfully received the unicast message corresponding to the multicast message sent to the target node; determining, at the HFI for the originator node, one or more reply messages have yet to be received from one or more of the target nodes within a timeout period; and in response thereto, generating and sending a unicast message corresponding to the multicast message from the HFI for the originator node to each of the one or more target nodes that did not return a reply message to the HFI of the originator node within the timeout period. 2. The method of claim 1 , wherein the multicast message is originated by a network software stack operating on the originator node, and the network software stack is not involved in the reliable delivery of the multicast message to each of the target nodes. 3. The method of claim 1 , wherein the HFI for the originator node receives a version of the multicast message having on original format generated by a network software stack operating on the originator node, and wherein the HFI for the originator node adds a multicast identifier (ID) to the original format of the multicast message that is to be used in the reply messages received from the plurality of target nodes. 4. The method of claim 1 , wherein the HFI for the originator node receives an original multicast message generated by a network software stack operating on the originator node, and wherein the HFI for the originator node employs one or more data structures for tracking replies to a given multicast message, the one or more data structures including: d) the original multicast message; e) a timestamp corresponding to when the multicast message is sent from the HFI for the originator node; and f) a list of pending acknowledgements to be received via corresponding reply messages sent from the plurality of target nodes. 5. The method of claim 1 , wherein the originator node is a first originator node and the multicast message is a first multicast message, further comprising: receiving, at the HFI for the first originator node, a message from the switch, the message corresponding to a second multicast message originating from a second originator node coupled to the non-reliable fabric; and returning, via the HFI for the first originator node, a reply message to the second originator node confirming receipt of the message corresponding to the second multicast message. 6. The method of claim 5 , wherein the originator node includes a network software stack, and the method further comprises forwarding one of the message, content contained in the message, or indicia indicating the message is in a memory buffer on the HFI from the HFI of the originator node to the network software stack. 7. The method of claim 5 , wherein the HFI verifies the message has been stored in persistent memory prior to returning the reply message, and the reply message contains an acknowledgement including a multicast identifier (ID) corresponding to the second multicast message. 8. The method of claim 5 , further comprising: receiving, at the HFI for the first originator node, a retry message corresponding to the second multicast message from a second originator node of the second multicast message; forwarding the retry message or content contained in the retry message from the HFI of the originator node to a network software stack; generating, via the network software stack, a second reply message and sending the second reply message via the HFI for the first originator node to the second originator node. 9. An apparatus comprising: a host fabric interface (HFI) including, a transmit port, configured to send data onto an non-reliable fabric; a receive port, configured to receive data from the non-reliable fabric; wherein the HFI further is configured to, send a multicast message to a switch in the non-reliable fabric, wherein the multicast message is configured to cause the switch to generate and send a unicast message corresponding to the multicast message to each of the plurality of target nodes via the non-reliable fabric; maintain indicia identifying which target nodes the multicast message is to be delivered to; receive one or more reply messages from one or more of the plurality of target nodes via the non-reliable fabric, the one or more reply messages indicating that the target node sending the reply message has successfully received the unicast message corresponding to the multicast message sent to the target node; determine one or more reply messages have yet to be received from one or more of the target nodes within a timeout period; and in response thereto, generate and send a unicast message corresponding to the multicast message to each of the one or more target nodes that did not return a reply message within the timeout period. 10. The apparatus of claim 9 , wherein the HFI is configured to be installed in or attached to a compute platform comprising an originator node, and upon operation is configured to receive a version of the multicast message having on original format originated by a network software stack operating on the originator node, and wherein the HFI is further configured to add an identifier to the original format of the multicast message that is to be used in the reply messages received from the plurality of target nodes. 11. The apparatus of claim 9 , wherein the HFI is configured to be installed in or attached to a compute platform comprising an originator node, and upon operation HFI is configured to receive an original multicast message originated by a network software stack operating on the originator node, and wherein the HFI for the originator node is configured to employ one or more data structures for tracking replies to a given multicast message, the one or more data structures including: d) the original multicast message; e) a timestamp corresponding to when the multicast message is sent from the HFI; and f) a list of pending acknowledgements to be received via corresponding reply messages sent from the plurality of target nodes. 12. The apparatus of claim 9 , wherein the HFI is configured to be installed in or attached to a compute platform comprising a first originator node and the multicast message is a first multicast message, and wherein the HFI is further configured to: receive a message from the switch, the message corresponding to a second multicast message originating from a second originator node coupled to the non-reliable fabric; return a reply message to the second originator node confirming receipt of the message corresponding to the second multicast message. 13. The apparatus of claim 12 , wherein the first originator node includes a network software stack, and the HFI is further configured to forward one of the message, content contained in the message, or indicia indicating the message is in a memory buffer on the HFI f

Assignees

Inventors

Classifications

  • Physical resource allocation for ACK/NACK (for physical mapping arrangements in ARQ protocols H04L1/1861) · CPC title

  • H04L49/201Primary

    Multicast operation; Broadcast operation · CPC title

  • using time related information in packets, e.g. by adding timestamps · CPC title

  • for broadcast or conference {, e.g. multicast} · CPC title

  • comprising mechanisms for improved reliability, e.g. status reports (arrangements for detecting or preventing errors by carrying supervisory signal the return channel H04L1/16) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10320710B2 cover?
Methods, apparatus, and systems for reliable replication mechanisms based on active-passive HFI protocols build on top of non-reliable multicast fabric implementations. Under a first hardware-based scheme, a reliable replication mechanism is (primarily) implemented via Host Fabric Interfaces (HFIs) coupled to (or integrated in) nodes coupled to a non-reliable fabric. Under this approach, the HF…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification H04L49/201. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Jun 11 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).