Graph refactorization method and graph refactorization apparatus
US-2022156324-A1 · May 19, 2022 · US
US12461969B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12461969-B2 |
| Application number | US-202318394497-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 22, 2023 |
| Priority date | Nov 15, 2021 |
| Publication date | Nov 4, 2025 |
| Grant date | Nov 4, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of this specification disclose graph data partition computer-implemented methods, non-transitory, computer-readable media, and computer-implemented systems. A computer-implemented method includes partitioning vertices in graph data into a plurality of dataset. Edges in the graph data are partitioned into datasets that include target vertices of the edges, where the datasets are used by nodes in a distributed cluster to perform graph computation, and where computational loads of the plurality of datasets are similar. Implementations of this specification can achieve load balancing between nodes in the distributed cluster and can reduce communication overhead.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method for graph data partitioning, comprising: performing a layer-1 partitioning on graph data to generate a first plurality of datasets, wherein each dataset in the first plurality of datasets corresponds to a respective node in a plurality of nodes in a distributed cluster, and wherein performing the layer-1 partitioning comprises: partitioning vertices in the graph data into the first plurality of datasets; partitioning edges in the graph data into the first plurality of datasets that include target vertices of the edges, performing a layer-2 partitioning on the first plurality of datasets to generate, for each node in the plurality of nodes in the distributed cluster, a second plurality of datasets, wherein for each node, a number of the second plurality of datasets is dependent on a count of threads or a count of processes of the node; and performing graph computation on the second plurality of datasets by each node in the plurality of nodes in the distributed cluster. 2 . The computer-implemented method of claim 1 , wherein partitioning the vertices in the graph data into the first plurality of datasets, comprises: determining a computational load reference value according to the graph data and a quantity of the datasets; partitioning the vertices in the graph data into the first plurality of datasets; and in a partition process, determining the computational loads of the first plurality of datasets, so that computational loads of the first plurality of datasets are similar to the computational load reference value. 3 . The computer-implemented method of claim 2 , wherein determining the computational load reference value, comprises: determining the computational load reference value according to a quantity of the vertices in the graph data, a quantity of edges of a vertex, and the quantity of the datasets; and the determining the computational loads of the first plurality of datasets comprises: determining the computational loads of the first plurality of datasets according to a quantity of vertices in the dataset and a quantity of edges of the vertex. 4 . The computer-implemented method of claim 1 , wherein partitioning the vertices in the graph data into the first plurality of datasets, comprises: obtaining a local feature of the graph data, wherein the local feature is used to represent a degree of proximity between vertices; and partitioning the vertices in the graph data into the first plurality of datasets according to the local feature. 5 . The computer-implemented method of claim 4 , wherein partitioning the vertices in the graph data into the first plurality of datasets, comprises: allocating identifiers to the vertices in the graph data according to the local feature, wherein a numbering sequence of the identifiers is used to represent the degree of proximity; and partitioning the vertices in the graph data into the first plurality of datasets according to the numbering sequence of the identifiers. 6 . The computer-implemented method of claim 1 , wherein partitioning the edges in the graph data into the first plurality of datasets that include the target vertices of the edges, comprises: building a table, wherein each row and each column of the table separately correspond to one dataset; for each edge in the graph data, determining a target row in the table according to a source vertex of the edge, determining a target column in the table according to a target vertex of the edge, and partitioning the edge into a cell limited by the target row and the target column; and partitioning an edge in a cell in each column into a dataset that corresponds to the column. 7 . The computer-implemented method of claim 1 , wherein performing the layer-2 partitioning comprises, for each node in the plurality of nodes in the distributed cluster: partitioning a dataset in the first plurality of datasets into the second plurality of datasets according to the count of threads or the count of processes of the node. 8 . A non-transitory, computer-readable medium storing one or more instructions executable by a computer system to perform one or more operations for graph data partitioning, comprising: performing a layer-1 partitioning on graph data to generate a first plurality of datasets, wherein each dataset in the first plurality of datasets corresponds to a respective node in a plurality of nodes in a distributed cluster, and wherein performing the layer-1 partitioning comprises: partitioning vertices in the graph data into the first plurality of datasets; partitioning edges in the graph data into the first plurality of datasets that include target vertices of the edges, performing a layer-2 partitioning on the first plurality of datasets to generate, for each node in the plurality of nodes in the distributed cluster, a second plurality of datasets, wherein for each node, a number of the second plurality of datasets is dependent on a count of threads or a count of processes of the node; and performing graph computation on the second plurality of datasets by each node in the plurality of nodes in the distributed cluster. 9 . The non-transitory, computer-readable medium of claim 8 , wherein partitioning the vertices in the graph data into the first plurality of datasets, comprises: determining a computational load reference value according to the graph data and a quantity of the datasets; partitioning the vertices in the graph data into the first plurality of datasets; and in a partition process, determining the computational loads of the first plurality of datasets, so that computational loads of the first plurality of datasets are similar to the computational load reference value. 10 . The non-transitory, computer-readable medium of claim 9 , wherein determining the computational load reference value, comprises: determining the computational load reference value according to a quantity of the vertices in the graph data, a quantity of edges of a vertex, and the quantity of the datasets; and the determining the computational loads of the first plurality of datasets comprises: determining the computational loads of the first plurality of datasets according to a quantity of vertices in the dataset and a quantity of edges of the vertex. 11 . The non-transitory, computer-readable medium of claim 8 , wherein partitioning the vertices in the graph data into the first plurality of datasets, comprises: obtaining a local feature of the graph data, wherein the local feature is used to represent a degree of proximity between vertices; and partitioning the vertices in the graph data into the first plurality of datasets according to the local feature. 12 . The non-transitory, computer-readable medium of claim 11 , wherein partitioning the vertices in the graph data into the first plurality of datasets, comprises: allocating identifiers to the vertices in the graph data according to the local feature, wherein a numbering sequence of the identifiers is used to represent the degree of proximity; and partitioning the vertices in the graph data into the first plurality of datasets according to the numbering sequence of the identifiers. 13 . The non-transitory, computer-readable medium of claim 8 , wherein partitioning the edges in the graph data into the first plurality of datasets that include the target vertices of the edges, comprises: building a table, wherein each row and each column of the table separately correspond to one dataset; for each edge in the graph data, determining a target row in the table according to a source vertex of the edge,
for load management (allocation of a server based on load conditions G06F9/505; load rebalancing G06F9/5083; redistributing the load in a network by a load balancer H04L67/1029) · CPC title
using metadata automatically derived from the content · CPC title
Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title
Indexing; Data structures therefor; Storage structures · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.