Graph data processing method and apparatus, and system

US12147473B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12147473-B2
Application numberUS-202217580605-A
CountryUS
Kind codeB2
Filing dateJan 20, 2022
Priority dateNov 30, 2016
Publication dateNov 19, 2024
Grant dateNov 19, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Graph data processing methods and system are disclosed. One example method comprises obtaining, by a master node, graph data, wherein the graph data comprises M vertexes and a plurality of directional edges, each edge connects two vertexes, a direction of each edge is from a source to a destination vertex in the two vertexes, and M is an integer greater than two. The node divides the graph data into P non-overlapping shards, where each shard comprises at least one incoming edge directed to at least one vertex in the corresponding shard. The node schedules at least two edge sets comprised in a first shard of the P shards and an associate edge set comprised in a second shard of the P shards for processing by at least two worker nodes.

First claim

Opening claim text (preview).

What is claimed is: 1. A graph data processing method, comprising: obtaining, by a master node, graph data, wherein the graph data comprises M vertexes and a plurality of directional edges, each edge connects two vertexes of the M vertexes, a direction of each edge is from a source vertex to a destination vertex in the two vertexes, and M is an integer greater than two; dividing, by the master node, the graph data, into P non-overlapping shards, wherein each of the P shards comprises at least one incoming edge directed to at least one vertex in the corresponding shard, a sum of quantities of vertexes corresponding to the P shards is equal to M, P is a positive integer greater than 1; scheduling, by the master node, at least two edge sets comprised in a first shard of the P shards and an associate edge set comprised in a second shard of the P shards for processing by at least two worker nodes, wherein the associate edge set comprises an outgoing edge directed from a vertex in the first shard; and synchronizing, by the master node, a first data block stored by a first worker node with a second data block stored by a second worker node based on setting a synchronization flag that instructs the first worker node to perform data synchronization with the second worker node after processing the first data block, wherein the first data block is scheduled last according to a scheduling sequence in the first shard, and wherein the second data block is scheduled last in data blocks corresponding to the associate edge set. 2. The method according to claim 1 , wherein the method further comprises: filling, by the master node, values corresponding to edges in the P shards into a P×P matrix according to directions of the edges, to obtain P 2 data blocks, wherein each data block comprises an edge set, each of the P shards comprises at least two data blocks, the P×P matrix is formed by using source vertexes in the M vertexes as a horizontal axis and corresponding destination vertexes in the M vertexes as a vertical axis or using the source vertexes as the vertical axis and corresponding source vertexes as the horizontal axis. 3. The method according to claim 2 , wherein the method further comprises: setting, by the master node, a block identifier for each of the P 2 data blocks; and determining, by the master node, a correspondence between block identifiers of the P 2 data blocks and a plurality of worker nodes; and wherein scheduling the at least two edge sets comprises: scheduling, based on the correspondence, at least two data blocks corresponding to the first shard and at least one data block corresponding to the associate edge set comprised in the second shard for processing by at least two worker nodes. 4. The method according to claim 3 , wherein the method further comprises: scheduling, based on the correspondence, data blocks corresponding to the second shard and at least one data block corresponding to an associate edge set comprised in the first shard for processing by at least two worker nodes. 5. The method according to claim 1 , wherein the method further comprises: determining, by the master node, a degree of each vertex in each shard of the P shards, wherein the degree indicates association strength between the vertex and other vertexes in the corresponding shard; and classifying, by the master node, incoming edges of vertexes in a shard with degrees exceeding a preset threshold into a same edge set. 6. A graph data processing method, comprising: receiving, by a first worker node, X edge sets comprised in graph data that are scheduled by a master node according to a scheduling sequence to be processed by the first worker node, wherein the graph data comprises P non-overlapping shards determined by the master node, the graph data comprises M vertexes and a plurality of edges, each edge connects two of the M vertexes, a direction of each edge is from a source vertex to a destination vertex in the two vertexes, M is an integer greater than 2, each shard comprises at least one incoming edge of at least one vertex, the incoming edge is an edge directed to the at least one vertex, a sum of quantities of vertexes corresponding to the P shards is equal to M, P is a positive integer greater than 1, and X is a positive integer; storing the X edge sets in a hard disk; loading, by the first worker node to a memory according to the scheduling sequence, Y edge sets in the X edge sets stored in the hard disk; and extracting data of Z edge sets from the memory for graph computing, wherein X≥Y≥Z, and X, Y, and Z are positive integers; and performing, by the first worker node, data synchronization with a second worker node that stores a second data block based on a synchronization flag after the first worker node processes a first data block, wherein the first data block is a last data block scheduled to the first worker node according to a scheduling sequence in a first shard of the P shards, and the second data block is a last data block scheduled to the second worker node in a second shard of the P shards, and the second data block comprises an outgoing edge of a vertex corresponding to the first shard. 7. The method according to claim 6 , wherein each edge set is a data block, the P shards comprise P 2 data blocks, the P 2 data blocks are obtained after the master node fills values corresponding to edges in the P shards into a P×P matrix based on directions of the edges. 8. A graph data processing apparatus, comprising: at least one processor; and a memory storing instructions for execution by the at least one processor to: obtain graph data, wherein the graph data comprises M vertexes and a plurality of directional edges, each edge connects two vertexes of the M vertexes, a direction of each edge is from a source vertex to a destination vertex in the two vertexes, and M is an integer greater than two; divide the graph data into P non-overlapping shards, wherein each of the P shards comprises at least one incoming edge directed to at least one vertex in the corresponding shard, a sum of quantities of vertexes corresponding to the P shards is equal to M, P is a positive integer greater than 1; schedule at least two edge sets comprised in a first shard of the P shards and an associate edge set comprised in a second shard of the P shards for processing by at least two worker nodes, wherein the associate edge set comprises an outgoing edge directed from a vertex in the first shard; and synchronize a first data block stored by a first worker node with a second data block stored by a second worker node based on setting a synchronization flag that instructs the first worker node to perform data synchronization with the second worker node after processing the first data block, wherein the first data block is scheduled last according to a scheduling sequence in the first shard, and wherein the second data block is scheduled last in data blocks corresponding to the associate edge set. 9. The apparatus according to claim 8 , wherein the instructions are for execution by the at least one processor to: fill values corresponding to edges in the P shards into a P×P matrix according to directions of the edges, to obtain P 2 data blocks, wherein each data block comprises an edge set, each of the P shards comprises at least two data blocks, the P×P matrix is formed by using source vertexes in the M vertexes as a horizontal axis and corresponding destination vertexes in the M vertexes as a vertical axis or using the source vertexes as the vertical axis and corresponding source vertexes as the horizontal axis. 10. The apparatus according to claim 9 , wherein the instructions are for execution by the at least one processor to:

Assignees

Inventors

Classifications

  • Allocation of resources, e.g. of the central processing unit [CPU] · CPC title

  • Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

  • Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title

  • Data partitioning, e.g. horizontal or vertical partitioning · CPC title

  • Asynchronous replication or reconciliation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12147473B2 cover?
Graph data processing methods and system are disclosed. One example method comprises obtaining, by a master node, graph data, wherein the graph data comprises M vertexes and a plurality of directional edges, each edge connects two vertexes, a direction of each edge is from a source to a destination vertex in the two vertexes, and M is an integer greater than two. The node divides the graph data…
Who is the assignee on this patent?
Huawei Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F16/9024. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 19 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).