Graph processing optimization method based on multi-fpga accelerator interconnection
US-2021081347-A1 · Mar 18, 2021 · US
US12475078B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12475078-B2 |
| Application number | US-202217815436-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 27, 2022 |
| Priority date | Mar 7, 2022 |
| Publication date | Nov 18, 2025 |
| Grant date | Nov 18, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present invention relates to a post-exascale graph computing method, and corresponding system, storage medium and electronic device. The invention solves the problems of low computing performance, poor scalability and high communication overhead in the large-scale distributed environment, and improves the performance of the supercomputer when supporting large-scale graph computing.
Opening claim text (preview).
What is claimed is: 1 . A post-exascale graph computing method, comprising: a step of performing distributed, asynchronous graph processing and a step of establishing hierarchical, very-large-scale, distributed communication; in which said step of performing distributed, asynchronous graph processing comprises: with a core subgraph constructed, having at least at least one computing node select graph blocks including active graph vertices in said core subgraph selected first for processing based on a topology-aware graph processing mechanism to propagate state information of graph vertices and accelerate state convergence among said graph vertices, generating a communications message, wherein each graph vertices contain respective state values; and in which said step of establishing hierarchical, very-large-scale, distributed communication comprises: arranging a plurality of computing nodes onto floors in a tree-like structure, partitioning graph data in a community-structure-aware manner, and assigning said graph data to individual said computing nodes on a floor, partitioning the graph data in a community-structure-aware manner to form graph partitions, wherein partitioning comprises: determining partitioned core subgraphs based on the graph partitions having their graph vertices highly dependent on each other; and grouping the partitioned core subgraphs into the same group of computing nodes on a floor according to the dependency of the state values, so that frequent communication among the partitioned core subgraphs happens inside the same group of computing nodes on the same floor, when plural said computing nodes on the same floor communicate, merging communication information associated with the same graph vertex based on reduction so as to decrease communication traffic, wherein reduction is by an algorithm in which information sent to the same computing node is merged into a same queue for communications at batch; and sending the communication information to be sent to the same computing node on a higher floor in a merged and unified form; after the graph data of each said higher floor computing node and of a lower floor computing node that is subordinate to said higher floor computing node have converged, having said high floor computing node communicate, after reduction merger, and compression of communications messages, reducing communication traffic, with a next higher floor computing node that said higher floor computing node is subordinate to and with said higher floor computing nodes that are of said some floor as said higher floor computing node, so as to synchronize states of the graph vertices that are peers; and performing communication floor by floor until all said graph vertices in a cluster have their states converged. 2 . A storage device having a computer program stored thereon, characterized in that the program, when executed by a processor, implements the step of the post-exascale graph computing method of claim 1 . 3 . An electronic device, characterized in that it comprises: a memory having a computer program stored thereon; a processor for executing the computer program in the memory to implement the step of the post-exascale graph computing method of claim 1 . 4 . A post-exascale graph computing system, comprising at least a processor, characterized in that the processor comprises at least: a first module for performing distributed, asynchronous graph processing; a second module for establishing hierarchical, very-large-scale, distributed communication; wherein the first module is configured to: with a core subgraph constructed, having at least at least one computing node select graph blocks including active graph vertices in said core subgraph selected first for processing based on a topology-aware graph processing mechanism to propagate state information of graph vertices and accelerate state convergence among said graph vertices, generating a communications message, wherein each graph vertices contain respective state values; and the second module is configured to: arranging a plurality of computing nodes onto floors in a tree-like structure, partitioning graph data in a community-structure-aware manner, and assigning said graph data to individual said computing nodes on a floor, partitioning the graph data in a community-structure-aware manner to form graph partitions, wherein partitioning comprises: determining partitioned core subgraphs based on the graph partitions having their graph vertices highly dependent on each other; and grouping the partitioned core subgraphs into the same group of computing nodes on a floor according to the dependency of the state values, so that frequent communication among the partitioned core subgraphs happens inside the same group of computing nodes on the same floor, when plural said computing nodes on the same floor communicate, merging communication information associated with the same graph vertex based on reduction so as to decrease communication traffic, wherein reduction is by an algorithm in which information sent to the same computing node is merged into a same queue for communications at batch; and sending the communication information to be sent to the same computing node on a higher floor in a merged and unified form; after the graph data of each said higher floor computing node and of a lower floor computing node that is subordinate to said higher floor computing node have converged, having said high floor computing node communicate, after reduction merger, and compression of communications messages, reducing communication traffic, with a next higher floor computing node that said higher floor computing node is subordinate to and with said higher floor computing nodes that are of said some floor as said higher floor computing node, so as to synchronize states of the graph vertices that are peers; and performing communication floor by floor until all said graph vertices in a cluster have their states converged.
in which an application is distributed across nodes in the network (software deployment G06F8/60; multiprogramming arrangements G06F9/46) · CPC title
Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs (mappping at compile time, see G06F8/451) · CPC title
Queue · CPC title
Message passing systems or structures, e.g. queues · CPC title
Program synchronisation; Mutual exclusion, e.g. by means of semaphores · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.