Parallel graph events processing
US-2020210481-A1 · Jul 2, 2020 · US
US12536196B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12536196-B2 |
| Application number | US-202318544666-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 19, 2023 |
| Priority date | Oct 20, 2021 |
| Publication date | Jan 27, 2026 |
| Grant date | Jan 27, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of this specification provide distributed data processing methods, apparatuses, and devices. One method includes: determining an active vertex set that currently participates in data processing in target graph data, in response to determining that an external memory of a first distributed node stores an active vertex in the active vertex set, determining, from a plurality of predetermined data processing modes, a target data processing mode that matches the active vertex set, determining, based on the target data processing mode, a to-be-updated vertex according to the association relationship with the active vertex, and sending, based on first data of the active vertex in the external memory, a first update message to a target distributed node in which the to-be-updated vertex is located.
Opening claim text (preview).
The invention claimed is: 1 . A distributed data processing method, comprising: receiving shard data and attribute information of target graph data sent by a specified device, wherein the shard data are obtained by the specified device based on performing division processing on the target graph data in a predetermined data division manner; and storing the shard data and the attribute information in an external memory of a first distributed node; determining an active vertex set that currently participates in data processing in the target graph data, wherein the target graph data are pre-generated based on event information of a plurality of target events, the event information comprises a plurality of event elements corresponding to the target events, each vertex of the target graph data corresponds to one of the plurality of event elements, and each edge of the target graph data is connected to the vertex according to an association relationship; in response to determining that the external memory of the first distributed node stores an active vertex in the active vertex set, determining, from a plurality of predetermined data processing modes, a target data processing mode that matches the active vertex set; determining, based on the target data processing mode, a to-be-updated vertex according to the association relationship with the active vertex; and sending, based on first data of the active vertex in the external memory, a first update message to a target distributed node in which the to-be-updated vertex is located for the target distributed node to perform, based on the first update message, update processing on second data of the to-be-updated vertex in an external memory of the target distributed node. 2 . The method according to claim 1 , wherein the vertex comprises an ingress point and an egress point, each edge in the target graph data is determined as a directed edge, the directed edge points from the ingress point to the egress point, the directed edge is an egress edge of the ingress point, and the directed edge is an ingress edge of the egress point; wherein the shard data comprise a vertex subset, an ingress edge set corresponding to an ingress edge of each vertex in the vertex subset, an egress edge set corresponding to an egress edge of each vertex in the vertex subset, a primary backup of each vertex in the vertex subset, and an image backup of a vertex that forms the directed edge with each vertex in the vertex subset, the primary backup comprises element data of an event element corresponding to a vertex, and the image backup is used to transfer a message; and wherein the attribute information comprises a first quantity of edges of the target graph data and a second quantity of egress edges of each vertex in the target graph data. 3 . The method according to claim 2 , wherein the determining a target data processing mode that matches the active vertex set in a plurality of predetermined data processing modes comprises: calculating density of the active vertex set in a predetermined calculation manner; and determining, based on the density, the target data processing mode that matches the active vertex set in a predetermined push data processing mode and a predetermined pull data processing mode. 4 . The method according to claim 3 , wherein the calculating density of the active vertex set in a predetermined calculation manner comprises: counting a third quantity of active vertexes in the active vertex set; counting a total quantity of egress edges of each active vertex in the active vertex set based on the second quantity; determining the total quantity as a fourth quantity; and calculating the density of the active vertex set based on the third quantity and the fourth quantity in the predetermined calculation manner; and wherein the determining, based on the density, the target data processing mode that matches the active vertex set in a predetermined push data processing mode and a predetermined pull data processing mode comprises: determining contrast density based on the first quantity; determining whether the density of the active vertex set is greater than or equal to the contrast density; in response to determining that the density of the active vertex set is greater than or equal to the contrast density, determining the pull data processing mode as the target data processing mode; and in response to determining that the density of the active vertex set is less than the contrast density, determining the push data processing mode as the target data processing mode. 5 . The method according to claim 3 , wherein the determining, based on the target data processing mode, a to-be-updated vertex has according to the association relationship with the active vertex comprises: in response to determining that the target data processing mode is the push data processing mode, determining, based on the egress edge set and the image backup that are stored in the external memory of the first distributed node, that a corresponding target egress point exists when the active vertex serves as the ingress point; and determining the target egress point as the to-be-updated vertex; and wherein the sending, based on first data of the any active vertex in the external memory, a first update message to a target distributed node in which the to-be-updated vertex is located comprises: obtaining the first data of the active vertex from the external memory of the first distributed node; determining a target distributed node in which the to-be-updated vertex and an image backup of the to-be-updated vertex are located; and sending the first update message to the target distributed node based on vertex information of the active vertex and the first data. 6 . The method according to claim 5 , wherein the method further comprises: determining an active vertex corresponding to the vertex information in the first update message as a target active vertex if the first update message sent by the first distributed node or another distributed node is received; determining, based on the egress edge set stored in the external memory of the first distributed node, that at least one corresponding target egress point exists when the target active vertex serves as the ingress point; and performing update processing on second data of the target egress point in the external memory of the first distributed node based on the first data in the first update message. 7 . The method according to claim 6 , wherein the performing update processing on second data of the target egress point in the external memory of the first distributed node comprises: determining a target thread corresponding to each target egress point; and sending the first update message to the corresponding target thread for the target thread to perform, based on the first data in the first update message, update processing on the second data of the corresponding target egress point in the external memory of the first distributed node. 8 . The method according to claim 7 , wherein the sending the first update message to the corresponding target thread comprises: determining whether the second data of the target egress point is in a locked state; in response to determining that the second data is in the locked state, storing the first update message in a message queue of the corresponding target egress point, so that the target thread corresponding to the target egress point obtains the first update message from the message queue after performing unlocking processing on the second data, and performs update processing on the second data based on the first data in the first update message after performing locking processing on the second data; and in response t
Asynchronous replication or reconciliation · CPC title
the resource being a machine, e.g. CPUs, Servers, Terminals · CPC title
Querying, e.g. by the use of web search engines · CPC title
Distributed queries · CPC title
Data partitioning, e.g. horizontal or vertical partitioning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.