Distributed data processing

US12536196B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12536196-B2
Application numberUS-202318544666-A
CountryUS
Kind codeB2
Filing dateDec 19, 2023
Priority dateOct 20, 2021
Publication dateJan 27, 2026
Grant dateJan 27, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of this specification provide distributed data processing methods, apparatuses, and devices. One method includes: determining an active vertex set that currently participates in data processing in target graph data, in response to determining that an external memory of a first distributed node stores an active vertex in the active vertex set, determining, from a plurality of predetermined data processing modes, a target data processing mode that matches the active vertex set, determining, based on the target data processing mode, a to-be-updated vertex according to the association relationship with the active vertex, and sending, based on first data of the active vertex in the external memory, a first update message to a target distributed node in which the to-be-updated vertex is located.

First claim

Opening claim text (preview).

The invention claimed is: 1 . A distributed data processing method, comprising: receiving shard data and attribute information of target graph data sent by a specified device, wherein the shard data are obtained by the specified device based on performing division processing on the target graph data in a predetermined data division manner; and storing the shard data and the attribute information in an external memory of a first distributed node; determining an active vertex set that currently participates in data processing in the target graph data, wherein the target graph data are pre-generated based on event information of a plurality of target events, the event information comprises a plurality of event elements corresponding to the target events, each vertex of the target graph data corresponds to one of the plurality of event elements, and each edge of the target graph data is connected to the vertex according to an association relationship; in response to determining that the external memory of the first distributed node stores an active vertex in the active vertex set, determining, from a plurality of predetermined data processing modes, a target data processing mode that matches the active vertex set; determining, based on the target data processing mode, a to-be-updated vertex according to the association relationship with the active vertex; and sending, based on first data of the active vertex in the external memory, a first update message to a target distributed node in which the to-be-updated vertex is located for the target distributed node to perform, based on the first update message, update processing on second data of the to-be-updated vertex in an external memory of the target distributed node. 2 . The method according to claim 1 , wherein the vertex comprises an ingress point and an egress point, each edge in the target graph data is determined as a directed edge, the directed edge points from the ingress point to the egress point, the directed edge is an egress edge of the ingress point, and the directed edge is an ingress edge of the egress point; wherein the shard data comprise a vertex subset, an ingress edge set corresponding to an ingress edge of each vertex in the vertex subset, an egress edge set corresponding to an egress edge of each vertex in the vertex subset, a primary backup of each vertex in the vertex subset, and an image backup of a vertex that forms the directed edge with each vertex in the vertex subset, the primary backup comprises element data of an event element corresponding to a vertex, and the image backup is used to transfer a message; and wherein the attribute information comprises a first quantity of edges of the target graph data and a second quantity of egress edges of each vertex in the target graph data. 3 . The method according to claim 2 , wherein the determining a target data processing mode that matches the active vertex set in a plurality of predetermined data processing modes comprises: calculating density of the active vertex set in a predetermined calculation manner; and determining, based on the density, the target data processing mode that matches the active vertex set in a predetermined push data processing mode and a predetermined pull data processing mode. 4 . The method according to claim 3 , wherein the calculating density of the active vertex set in a predetermined calculation manner comprises: counting a third quantity of active vertexes in the active vertex set; counting a total quantity of egress edges of each active vertex in the active vertex set based on the second quantity; determining the total quantity as a fourth quantity; and calculating the density of the active vertex set based on the third quantity and the fourth quantity in the predetermined calculation manner; and wherein the determining, based on the density, the target data processing mode that matches the active vertex set in a predetermined push data processing mode and a predetermined pull data processing mode comprises: determining contrast density based on the first quantity; determining whether the density of the active vertex set is greater than or equal to the contrast density; in response to determining that the density of the active vertex set is greater than or equal to the contrast density, determining the pull data processing mode as the target data processing mode; and in response to determining that the density of the active vertex set is less than the contrast density, determining the push data processing mode as the target data processing mode. 5 . The method according to claim 3 , wherein the determining, based on the target data processing mode, a to-be-updated vertex has according to the association relationship with the active vertex comprises: in response to determining that the target data processing mode is the push data processing mode, determining, based on the egress edge set and the image backup that are stored in the external memory of the first distributed node, that a corresponding target egress point exists when the active vertex serves as the ingress point; and determining the target egress point as the to-be-updated vertex; and wherein the sending, based on first data of the any active vertex in the external memory, a first update message to a target distributed node in which the to-be-updated vertex is located comprises: obtaining the first data of the active vertex from the external memory of the first distributed node; determining a target distributed node in which the to-be-updated vertex and an image backup of the to-be-updated vertex are located; and sending the first update message to the target distributed node based on vertex information of the active vertex and the first data. 6 . The method according to claim 5 , wherein the method further comprises: determining an active vertex corresponding to the vertex information in the first update message as a target active vertex if the first update message sent by the first distributed node or another distributed node is received; determining, based on the egress edge set stored in the external memory of the first distributed node, that at least one corresponding target egress point exists when the target active vertex serves as the ingress point; and performing update processing on second data of the target egress point in the external memory of the first distributed node based on the first data in the first update message. 7 . The method according to claim 6 , wherein the performing update processing on second data of the target egress point in the external memory of the first distributed node comprises: determining a target thread corresponding to each target egress point; and sending the first update message to the corresponding target thread for the target thread to perform, based on the first data in the first update message, update processing on the second data of the corresponding target egress point in the external memory of the first distributed node. 8 . The method according to claim 7 , wherein the sending the first update message to the corresponding target thread comprises: determining whether the second data of the target egress point is in a locked state; in response to determining that the second data is in the locked state, storing the first update message in a message queue of the corresponding target egress point, so that the target thread corresponding to the target egress point obtains the first update message from the message queue after performing unlocking processing on the second data, and performs update processing on the second data based on the first data in the first update message after performing locking processing on the second data; and in response t

Assignees

Inventors

Classifications

  • G06F16/273Primary

    Asynchronous replication or reconciliation · CPC title

  • the resource being a machine, e.g. CPUs, Servers, Terminals · CPC title

  • Querying, e.g. by the use of web search engines · CPC title

  • Distributed queries · CPC title

  • G06F16/278Primary

    Data partitioning, e.g. horizontal or vertical partitioning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12536196B2 cover?
Embodiments of this specification provide distributed data processing methods, apparatuses, and devices. One method includes: determining an active vertex set that currently participates in data processing in target graph data, in response to determining that an external memory of a first distributed node stores an active vertex in the active vertex set, determining, from a plurality of predete…
Who is the assignee on this patent?
Alipay Hangzhou Inf Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F16/273. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 27 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).