Heterogeneous accelerator for highly efficient learning systems
US-2019079886-A1 · Mar 14, 2019 · US
US11263168B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11263168-B2 |
| Application number | US-201916722082-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 20, 2019 |
| Priority date | Jan 29, 2019 |
| Publication date | Mar 1, 2022 |
| Grant date | Mar 1, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An FPGA-based graph data processing method is provided for executing graph traversals on a graph having characteristics of a small-world network by using a first processor being a CPU and a second processor that is a FPGA and is in communicative connection with the first processor, wherein the first processor sends graph data to be traversed to the second processor, and obtains result data of the graph traversals from the second processor for result output after the second processor has completed the graph traversals of the graph data by executing level traversals, and the second processor comprises a sparsity processing module and a density processing module, the sparsity processing module operates in a beginning stage and/or an ending stage of the graph traversals, and the density processing module with a higher degree of parallelism than the sparsity processing module operates in the intermediate stage of the graph traversals.
Opening claim text (preview).
What is claimed is: 1. A field programmable gate array (FPGA)-based graph data processing method for executing graph traversals on a graph having characteristics of a small-world network, the method comprising: executing the graph traversals using a first processor and a second processor, wherein the second processor is in communicative connection with the first processor, and wherein the first processor is a CPU and the second processor is an FPGA; wherein the first processor: sends graph data to be traversed to the second processor; and obtains result data of the graph traversals from the second processor for result output after the second processor has completed the graph traversals of the graph data by executing level traversals; wherein the second processor comprises a sparsity processing module and a density processing module, which respectively use on-chip logical resources in different areas of the second processor, wherein the density processing module has a higher degree of parallelism than that of the sparsity processing module, the sparsity processing module works in at least one of the beginning stage and an ending stage of the graph traversals where parallelism is low, and the density processing module works in the intermediate stage of the graph traversals where parallelism is high; wherein before the second processor executes the graph traversals, the first processor first programs an algorithmic logic for the graph traversals to be executed into a logic circuit of the second processor, and then transmits the graph data to be traversed into a global storage of the second processor; and wherein the sparsity processing module takes a selected initial point as an origin for division to divide the graph data in the global storage into plural data blocks using a heuristic algorithm, and then selects a data block containing the initial point from the plural data blocks and transmits the data block containing the initial point to an on-chip memory of the second processor. 2. The method of claim 1 , wherein the sparsity processing module performs the steps of: determining an initial node of the traversals based on the data block containing the initial node in the on-chip memory; executing the graph traversals in the beginning stage from the initial node; and analyzing and identifying which stage the graph traversals are in while executing the graph traversals, and when the graph traversals are in the beginning stage with low-parallelism, proceeding with the graph traversals in the sparsity processing module, and when the graph traversals enter the intermediate stage with high-parallelism, stop executing the graph traversals, saving a first executive state and the data as they are at the stopping moment, and then requiring the density processing module to execute the graph traversals after the data are saved. 3. The method of claim 2 , wherein the density processing module performs the steps of: acquiring the first executive state and the data saved by the sparsity processing module and rebuilding the first executive state of the graph traversals according thereto; when the first executive state of the graph traversals has been reconstructed, executing the graph traversals based on the data saved by the sparsity processing module; and analyzing and identifying which stage the graph traversals are in while executing the graph traversals, and when the graph traversals are in the intermediate stage with high-parallelism, proceeding with the graph traversals in the density processing module, and when the graph traversals enter the ending stage with low-parallelism, stop executing the graph traversals, saving a second executive state and the data as they are at the stopping moment, and then requiring the sparsity processing module to execute the graph traversals after the data are saved. 4. The method of claim 2 , wherein the sparsity processing module further performs the steps of: acquiring the second executive state and the data saved by the density processing module and rebuilding the second executive state according thereto; when the second executive state has been reconstructed, executing the graph traversals based on the data saved by the density processing module; and analyzing and identifying whether the graph traversals have been completed while executing the graph traversals, and when the graph traversals have not been completed, proceeding with the graph traversals, and when the graph traversals have been completed, processing an acquired executive result so as to obtain result data of the graph traversals, and writing the result data into the global memory. 5. The method of claim 4 , wherein the step of analyzing and identifying whether the graph traversals have been completed comprises: when the nodes adjacent to the currently traversed node have all been marked as traversed, comparing the nodes of the graph with the nodes that have been traversed, and where all the nodes are confirmed as traversed, determining that the graph traversal is complete, and where there is still a node not traversed, determining that there is a subgraph that has not been traversed, and analyzing the data block in which there is the node that has not been traversed and transmitting the data block into the on-chip memory, so as to initiate the graph traversal in the subgraph by means of initial node sampling. 6. The method of claim 5 , wherein the method further comprises: building an executive-state-analyzing module in the second processor, where the executive-state-analyzing module performs the steps of: determining the data block required by at least one of the sparsity processing module and the density processing module in the successive graph traversal by analyzing and identifying an executive state for which the at least one of the sparsity processing module and the density processing module executes the graph traversals; and transmitting that required data block to the on-chip memory before the next traversal iteration. 7. A field programmable gate array (FPGA)-based graph traversal system for executing graph traversals on a graph having characteristics of a small world network, the system comprises a first processor and a second processor, wherein the second processor is in communicative connection with the first processor, and wherein the first processor is a CPU and the second processor is an FPGA; wherein the first processor is configured to: send graph data to be traversed to the second processor; obtain result data of the graph traversals from the second processor for result output after the second processor has completed the graph traversals of the graph data by executing level traversals; and the second processor comprises a sparsity processing module and a density processing module, which respectively use on-chip logical resources in different areas of the second processor, the density processing module has a higher degree of parallelism than the sparsity processing module, the sparsity processing module is configured to operate in a beginning stage and/or an ending stage of the graph traversals where parallelism is low, and the density processing module is configured to operate in the intermediate stage of the graph traversals where parallelism is high; wherein before the second processor executes the graph traversals, the first processor is configured to first program an algorithmic logic for the graph traversals to be executed into a logic circuit of the second processor, and then transmit the graph data to be traversed into a global storage of the second processor; and the sparsity processing module is configured to take a selected initial point as an origin for division to divide the graph data in the global storage into plural data b
Dataflow computers · CPC title
Gate array · CPC title
Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title
Reconfigurable logic implemented as a co-processor (instruction execution using a coprocessor G06F9/3877) · CPC title
Architectures of general purpose stored program computers (with program plugboard G06F15/08; multicomputers G06F15/16) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.