What technology area does this patent fall under?

Primary CPC classification G06F16/9024. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 01 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

FPGA coprocessor with sparsity and density modules for execution of low and high parallelism portions of graph traversals

US11263168B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11263168-B2
Application number	US-201916722082-A
Country	US
Kind code	B2
Filing date	Dec 20, 2019
Priority date	Jan 29, 2019
Publication date	Mar 1, 2022
Grant date	Mar 1, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An FPGA-based graph data processing method is provided for executing graph traversals on a graph having characteristics of a small-world network by using a first processor being a CPU and a second processor that is a FPGA and is in communicative connection with the first processor, wherein the first processor sends graph data to be traversed to the second processor, and obtains result data of the graph traversals from the second processor for result output after the second processor has completed the graph traversals of the graph data by executing level traversals, and the second processor comprises a sparsity processing module and a density processing module, the sparsity processing module operates in a beginning stage and/or an ending stage of the graph traversals, and the density processing module with a higher degree of parallelism than the sparsity processing module operates in the intermediate stage of the graph traversals.

First claim

Opening claim text (preview).

What is claimed is: 1. A field programmable gate array (FPGA)-based graph data processing method for executing graph traversals on a graph having characteristics of a small-world network, the method comprising: executing the graph traversals using a first processor and a second processor, wherein the second processor is in communicative connection with the first processor, and wherein the first processor is a CPU and the second processor is an FPGA; wherein the first processor: sends graph data to be traversed to the second processor; and obtains result data of the graph traversals from the second processor for result output after the second processor has completed the graph traversals of the graph data by executing level traversals; wherein the second processor comprises a sparsity processing module and a density processing module, which respectively use on-chip logical resources in different areas of the second processor, wherein the density processing module has a higher degree of parallelism than that of the sparsity processing module, the sparsity processing module works in at least one of the beginning stage and an ending stage of the graph traversals where parallelism is low, and the density processing module works in the intermediate stage of the graph traversals where parallelism is high; wherein before the second processor executes the graph traversals, the first processor first programs an algorithmic logic for the graph traversals to be executed into a logic circuit of the second processor, and then transmits the graph data to be traversed into a global storage of the second processor; and wherein the sparsity processing module takes a selected initial point as an origin for division to divide the graph data in the global storage into plural data blocks using a heuristic algorithm, and then selects a data block containing the initial point from the plural data blocks and transmits the data block containing the initial point to an on-chip memory of the second processor. 2. The method of claim 1 , wherein the sparsity processing module performs the steps of: determining an initial node of the traversals based on the data block containing the initial node in the on-chip memory; executing the graph traversals in the beginning stage from the initial node; and analyzing and identifying which stage the graph traversals are in while executing the graph traversals, and when the graph traversals are in the beginning stage with low-parallelism, proceeding with the graph traversals in the sparsity processing module, and when the graph traversals enter the intermediate stage with high-parallelism, stop executing the graph traversals, saving a first executive state and the data as they are at the stopping moment, and then requiring the density processing module to execute the graph traversals after the data are saved. 3. The method of claim 2 , wherein the density processing module performs the steps of: acquiring the first executive state and the data saved by the sparsity processing module and rebuilding the first executive state of the graph traversals according thereto; when the first executive state of the graph traversals has been reconstructed, executing the graph traversals based on the data saved by the sparsity processing module; and analyzing and identifying which stage the graph traversals are in while executing the graph traversals, and when the graph traversals are in the intermediate stage with high-parallelism, proceeding with the graph traversals in the density processing module, and when the graph traversals enter the ending stage with low-parallelism, stop executing the graph traversals, saving a second executive state and the data as they are at the stopping moment, and then requiring the sparsity processing module to execute the graph traversals after the data are saved. 4. The method of claim 2 , wherein the sparsity processing module further performs the steps of: acquiring the second executive state and the data saved by the density processing module and rebuilding the second executive state according thereto; when the second executive state has been reconstructed, executing the graph traversals based on the data saved by the density processing module; and analyzing and identifying whether the graph traversals have been completed while executing the graph traversals, and when the graph traversals have not been completed, proceeding with the graph traversals, and when the graph traversals have been completed, processing an acquired executive result so as to obtain result data of the graph traversals, and writing the result data into the global memory. 5. The method of claim 4 , wherein the step of analyzing and identifying whether the graph traversals have been completed comprises: when the nodes adjacent to the currently traversed node have all been marked as traversed, comparing the nodes of the graph with the nodes that have been traversed, and where all the nodes are confirmed as traversed, determining that the graph traversal is complete, and where there is still a node not traversed, determining that there is a subgraph that has not been traversed, and analyzing the data block in which there is the node that has not been traversed and transmitting the data block into the on-chip memory, so as to initiate the graph traversal in the subgraph by means of initial node sampling. 6. The method of claim 5 , wherein the method further comprises: building an executive-state-analyzing module in the second processor, where the executive-state-analyzing module performs the steps of: determining the data block required by at least one of the sparsity processing module and the density processing module in the successive graph traversal by analyzing and identifying an executive state for which the at least one of the sparsity processing module and the density processing module executes the graph traversals; and transmitting that required data block to the on-chip memory before the next traversal iteration. 7. A field programmable gate array (FPGA)-based graph traversal system for executing graph traversals on a graph having characteristics of a small world network, the system comprises a first processor and a second processor, wherein the second processor is in communicative connection with the first processor, and wherein the first processor is a CPU and the second processor is an FPGA; wherein the first processor is configured to: send graph data to be traversed to the second processor; obtain result data of the graph traversals from the second processor for result output after the second processor has completed the graph traversals of the graph data by executing level traversals; and the second processor comprises a sparsity processing module and a density processing module, which respectively use on-chip logical resources in different areas of the second processor, the density processing module has a higher degree of parallelism than the sparsity processing module, the sparsity processing module is configured to operate in a beginning stage and/or an ending stage of the graph traversals where parallelism is low, and the density processing module is configured to operate in the intermediate stage of the graph traversals where parallelism is high; wherein before the second processor executes the graph traversals, the first processor is configured to first program an algorithmic logic for the graph traversals to be executed into a logic circuit of the second processor, and then transmit the graph data to be traversed into a global storage of the second processor; and the sparsity processing module is configured to take a selected initial point as an origin for division to divide the graph data in the global storage into plural data b

Assignees

Univ Huazhong Science Tech

Inventors

Classifications

G06F15/825
Dataflow computers · CPC title
G06F2015/768
Gate array · CPC title
G06F16/9024Primary
Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title
G06F15/7889
Reconfigurable logic implemented as a co-processor (instruction execution using a coprocessor G06F9/3877) · CPC title
G06F15/76Primary
Architectures of general purpose stored program computers (with program plugboard G06F15/08; multicomputers G06F15/16) · CPC title

Patent family

Related publications grouped by family.

View patent family 66502892

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11263168B2 cover?: An FPGA-based graph data processing method is provided for executing graph traversals on a graph having characteristics of a small-world network by using a first processor being a CPU and a second processor that is a FPGA and is in communicative connection with the first processor, wherein the first processor sends graph data to be traversed to the second processor, and obtains result data of t…
Who is the assignee on this patent?: Univ Huazhong Science Tech
What technology area does this patent fall under?: Primary CPC classification G06F16/9024. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 01 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Heterogeneous accelerator for highly efficient learning systems

Reconfigurable, Application-Specific Computer Accelerator

Methods and apparatus for programmable integrated circuit coprocessor sector management

Task scheduling method and apparatus on heterogeneous multi-core reconfigurable computing platform

Frequently asked questions