Path calculating method, program and calculating apparatus
US-9215163-B2 · Dec 15, 2015 · US
US10015056B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10015056-B2 |
| Application number | US-201615207706-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 12, 2016 |
| Priority date | Sep 24, 2014 |
| Publication date | Jul 3, 2018 |
| Grant date | Jul 3, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
System, method, and apparatus for improving the performance of collective operations in High Performance Computing (HPC). Compute nodes in a networked HPC environment form collective groups to perform collective operations. A spanning tree is formed including the compute nodes and switches and links used to interconnect the compute nodes, wherein the spanning tree is configured such that there is only a single route between any pair of nodes in the tree. The compute nodes implement processes for performing the collective operations, which includes exchanging messages between processes executing on other compute nodes, wherein the messages contain indicia identifying collective operations they belong to. Each switch is configured to implement message forwarding operations for its portion of the spanning tree. Each of the nodes in the spanning tree implements a ratcheted cyclical state machine that is used for synchronizing collective operations, along with status messages that are exchanged between nodes. Transaction IDs are also used to detect out-of-order and lost messages.
Opening claim text (preview).
What is claimed is: 1. A non-transitory computer readable medium having instructions stored thereon configured to be executed by a processor of a subnet manager in a network environment including a plurality of compute nodes interconnected via a plurality of switches and links, wherein execution of the instructions enables the subnet manager, when the subnet manager is linked in communication with a switch in the network environment, to: receive notifications from compute nodes indicating they are joining a collective group; determine a spanning tree to be used for the collective group comprising a plurality of nodes including the compute nodes providing notifications indicating they are joining the collective group and a set of switches including edge switches connected to the compute nodes and one or more levels of core switches including a core switch at the root of the spanning tree, wherein the spanning tree is configured such that each node in the spanning tree is enabled to communicate with each of the other nodes via a single respective specified route comprising at least one link segment; and provide configuration information to each of the switches in the spanning tree for implementing message forwarding operations for the portion of the spanning tree that includes links coupled to that switch. 2. The non-transitory computer readable medium of claim 1 , wherein execution of the instructions further enable the subnet manager to: receive network topology information from at least a portion of the compute nodes and switches in the network environment; and determine a network topology of at least a portion of the network environment including all of the nodes in the spanning tree. 3. The non-transitory computer readable of claim 1 , wherein each node in a spanning tree other than the node that is the root of the spanning includes a parent node and at least one child node, and the configuration information provided to each switch includes a collective forwarding table describing the switch's parent node and child nodes. 4. The non-transitory computer readable of claim 1 , wherein execution of the instructions further enables the subnet manager to: receive notification from compute nodes that they are leaving a collective group; and notify switches in the spanning tree that the collective group has been destroyed. 5. The non-transitory computer readable of claim 1 , wherein execution of the instructions further enables the subnet manager to: receive, for each of a plurality of compute nodes, a notification that the compute node is joining a collective group; and return a collective group identifier (CGID) to each of the plurality of compute nodes. 6. The non-transitory computer readable of claim 5 , wherein operations in the network environment are implemented using distributed Message Passing Interface (MPI) processes including a local master process on each of the plurality of compute nodes, and wherein the notification that the compute node is joining a collective group is received from the local master process of the compute node and the CGID is returned to the local master process of the compute node. 7. The non-transitory computer readable of claim 6 , wherein the notification that the compute node is joining a collective group includes registration information identifying MPI processes that are operating on the compute node. 8. At least one non-transitory computer readable medium, having instructions stored thereon comprising a distributed Message Passing Interface (MPI) application including a plurality of MPI processes and a subnet management application configured to facilitate a collective operation in a network environment including a plurality of compute nodes and a subnet manager (SM) interconnected via a plurality of switches and links, wherein distributed execution of instructions corresponding to the MPI processes on the compute nodes and the subnet management application on the SM performs operations including: sending information from compute nodes to the SM notifying the SM that the compute nodes are joining a collective group; configuring, via the SM, a spanning tree comprising a plurality of nodes including the compute nodes in the collective group and a set of switches including edge switches connected to compute nodes and one or more levels of core switches including a core switch at the root of the spanning tree, wherein the spanning tree is configured such that each node in the spanning tree is enabled to communicate with each of the other nodes via a single respective specified route comprising at least one link segment; configuring each switch in the spanning tree to be aware of specified routes involving that switch and one or more message identifiers to be included in collective operation messages used to perform the collective operations; and at each switch, identifying collective operation messages and their destinations and forwarding the collective operations messages along link segments of the specified routes connected to that switch. 9. The at least one non-transitory computer readable medium of claim 8 , wherein distributed execution of the instructions enable each node in the spanning tree to: implement a state machine; exchange state machine status messages with adjacent nodes; and wherein the nodes are configured to collectively employ the state machine status messages and state machines to synchronize collective operations performed by the collective group. 10. The at least one non-transitory computer readable medium of claim 9 , wherein the state machine at each node is implemented as a cyclical ratchet under which states may only advance one state at a time, the states including a first state, one or more middle states, and a last state, and wherein the state advances from the last state back to the first state. 11. The at least one non-transitory computer readable medium of claim 10 , wherein the state machine states includes an idle state, a filling state, a full state, and an exiting state. 12. The at least one non-transitory computer readable medium of claim 8 , wherein distributed execution of the instructions enable each node in the spanning tree to implement a transaction Identifying (ID) mechanism, and wherein the nodes collectively are enabled to employ transaction IDs to detect out-of-order or lost messages. 13. The at least one non-transitory computer readable medium of claim 8 , wherein distributed execution of the instructions enable each compute node in the spanning tree is to: initiate an application including one or more MPI processes; identify a master process; and notify the SM that the compute node is joining a collective group. 14. The at least one non-transitory computer readable medium of claim 8 , wherein distributed execution of the instructions further performs operations including: assigning an initial transaction ID =0 to all members of the collective group; originating operations at the edge of the spanning tree and driving up the spanning tree as sub-trees within the spanning tree complete local operations. 15. The at least one non-transitory computer readable medium of claim 14 , wherein distributed execution of the instructions further performs operations including: releasing operations beginning at a root of the spanning tree after the operations performed by the MPI processes on a given node in the spanning tree have been completed. 16. The at least one non-transitory computer readable medium of claim 15 , wherein distributed execution of the instructions further performs operations including:
comprising network management agents or mobile agents therefor · CPC title
Standardised network management protocols, e.g. simple network management protocol [SNMP] · CPC title
Routing tree calculation · CPC title
Learning-based routing, e.g. using neural networks or artificial intelligence · CPC title
Grid computing · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.