Graph data partitioning

US12461969B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12461969-B2
Application numberUS-202318394497-A
CountryUS
Kind codeB2
Filing dateDec 22, 2023
Priority dateNov 15, 2021
Publication dateNov 4, 2025
Grant dateNov 4, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of this specification disclose graph data partition computer-implemented methods, non-transitory, computer-readable media, and computer-implemented systems. A computer-implemented method includes partitioning vertices in graph data into a plurality of dataset. Edges in the graph data are partitioned into datasets that include target vertices of the edges, where the datasets are used by nodes in a distributed cluster to perform graph computation, and where computational loads of the plurality of datasets are similar. Implementations of this specification can achieve load balancing between nodes in the distributed cluster and can reduce communication overhead.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for graph data partitioning, comprising: performing a layer-1 partitioning on graph data to generate a first plurality of datasets, wherein each dataset in the first plurality of datasets corresponds to a respective node in a plurality of nodes in a distributed cluster, and wherein performing the layer-1 partitioning comprises: partitioning vertices in the graph data into the first plurality of datasets; partitioning edges in the graph data into the first plurality of datasets that include target vertices of the edges, performing a layer-2 partitioning on the first plurality of datasets to generate, for each node in the plurality of nodes in the distributed cluster, a second plurality of datasets, wherein for each node, a number of the second plurality of datasets is dependent on a count of threads or a count of processes of the node; and performing graph computation on the second plurality of datasets by each node in the plurality of nodes in the distributed cluster. 2 . The computer-implemented method of claim 1 , wherein partitioning the vertices in the graph data into the first plurality of datasets, comprises: determining a computational load reference value according to the graph data and a quantity of the datasets; partitioning the vertices in the graph data into the first plurality of datasets; and in a partition process, determining the computational loads of the first plurality of datasets, so that computational loads of the first plurality of datasets are similar to the computational load reference value. 3 . The computer-implemented method of claim 2 , wherein determining the computational load reference value, comprises: determining the computational load reference value according to a quantity of the vertices in the graph data, a quantity of edges of a vertex, and the quantity of the datasets; and the determining the computational loads of the first plurality of datasets comprises: determining the computational loads of the first plurality of datasets according to a quantity of vertices in the dataset and a quantity of edges of the vertex. 4 . The computer-implemented method of claim 1 , wherein partitioning the vertices in the graph data into the first plurality of datasets, comprises: obtaining a local feature of the graph data, wherein the local feature is used to represent a degree of proximity between vertices; and partitioning the vertices in the graph data into the first plurality of datasets according to the local feature. 5 . The computer-implemented method of claim 4 , wherein partitioning the vertices in the graph data into the first plurality of datasets, comprises: allocating identifiers to the vertices in the graph data according to the local feature, wherein a numbering sequence of the identifiers is used to represent the degree of proximity; and partitioning the vertices in the graph data into the first plurality of datasets according to the numbering sequence of the identifiers. 6 . The computer-implemented method of claim 1 , wherein partitioning the edges in the graph data into the first plurality of datasets that include the target vertices of the edges, comprises: building a table, wherein each row and each column of the table separately correspond to one dataset; for each edge in the graph data, determining a target row in the table according to a source vertex of the edge, determining a target column in the table according to a target vertex of the edge, and partitioning the edge into a cell limited by the target row and the target column; and partitioning an edge in a cell in each column into a dataset that corresponds to the column. 7 . The computer-implemented method of claim 1 , wherein performing the layer-2 partitioning comprises, for each node in the plurality of nodes in the distributed cluster: partitioning a dataset in the first plurality of datasets into the second plurality of datasets according to the count of threads or the count of processes of the node. 8 . A non-transitory, computer-readable medium storing one or more instructions executable by a computer system to perform one or more operations for graph data partitioning, comprising: performing a layer-1 partitioning on graph data to generate a first plurality of datasets, wherein each dataset in the first plurality of datasets corresponds to a respective node in a plurality of nodes in a distributed cluster, and wherein performing the layer-1 partitioning comprises: partitioning vertices in the graph data into the first plurality of datasets; partitioning edges in the graph data into the first plurality of datasets that include target vertices of the edges, performing a layer-2 partitioning on the first plurality of datasets to generate, for each node in the plurality of nodes in the distributed cluster, a second plurality of datasets, wherein for each node, a number of the second plurality of datasets is dependent on a count of threads or a count of processes of the node; and performing graph computation on the second plurality of datasets by each node in the plurality of nodes in the distributed cluster. 9 . The non-transitory, computer-readable medium of claim 8 , wherein partitioning the vertices in the graph data into the first plurality of datasets, comprises: determining a computational load reference value according to the graph data and a quantity of the datasets; partitioning the vertices in the graph data into the first plurality of datasets; and in a partition process, determining the computational loads of the first plurality of datasets, so that computational loads of the first plurality of datasets are similar to the computational load reference value. 10 . The non-transitory, computer-readable medium of claim 9 , wherein determining the computational load reference value, comprises: determining the computational load reference value according to a quantity of the vertices in the graph data, a quantity of edges of a vertex, and the quantity of the datasets; and the determining the computational loads of the first plurality of datasets comprises: determining the computational loads of the first plurality of datasets according to a quantity of vertices in the dataset and a quantity of edges of the vertex. 11 . The non-transitory, computer-readable medium of claim 8 , wherein partitioning the vertices in the graph data into the first plurality of datasets, comprises: obtaining a local feature of the graph data, wherein the local feature is used to represent a degree of proximity between vertices; and partitioning the vertices in the graph data into the first plurality of datasets according to the local feature. 12 . The non-transitory, computer-readable medium of claim 11 , wherein partitioning the vertices in the graph data into the first plurality of datasets, comprises: allocating identifiers to the vertices in the graph data according to the local feature, wherein a numbering sequence of the identifiers is used to represent the degree of proximity; and partitioning the vertices in the graph data into the first plurality of datasets according to the numbering sequence of the identifiers. 13 . The non-transitory, computer-readable medium of claim 8 , wherein partitioning the edges in the graph data into the first plurality of datasets that include the target vertices of the edges, comprises: building a table, wherein each row and each column of the table separately correspond to one dataset; for each edge in the graph data, determining a target row in the table according to a source vertex of the edge,

Assignees

Inventors

Classifications

  • for load management (allocation of a server based on load conditions G06F9/505; load rebalancing G06F9/5083; redistributing the load in a network by a load balancer H04L67/1029) · CPC title

  • using metadata automatically derived from the content · CPC title

  • Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title

  • G06F16/51Primary

    Indexing; Data structures therefor; Storage structures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12461969B2 cover?
Embodiments of this specification disclose graph data partition computer-implemented methods, non-transitory, computer-readable media, and computer-implemented systems. A computer-implemented method includes partitioning vertices in graph data into a plurality of dataset. Edges in the graph data are partitioned into datasets that include target vertices of the edges, where the datasets are used…
Who is the assignee on this patent?
Alipay Hangzhou Inf Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F16/9024. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 04 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).