What technology area does this patent fall under?

Primary CPC classification G06F9/52. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 06 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Efficient and more advanced implementation of ring-AllReduce algorithm for distributed parallel deep learning

US11520640B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11520640-B2
Application number	US-202016777711-A
Country	US
Kind code	B2
Filing date	Jan 30, 2020
Priority date	Jan 30, 2020
Publication date	Dec 6, 2022
Grant date	Dec 6, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure provides a method for syncing data of a computing task across a plurality of groups of computing nodes, each group comprising a set of computing nodes A-D, a set of intra-group interconnects that communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C, and a set of inter-group interconnects that communicatively couple a computing node A of a first group of the plurality of groups with a computing node A of a second group neighboring the first group, a computing node B of the first group with a computing node B of the second group, a computing node C of the first group with the computing node C of the second group, and a computing node D of the first group with a computing node D of the second group, the method comprising: syncing across a first dimension of computing nodes using a first set of ring connections, wherein the first set of ring connections are formed using inter-group and intra-group interconnects that communicatively couple the computing nodes along the first dimension; and broadcasting synced data across a second dimension of computing nodes using a second ring connection.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for syncing data of a computing task across a plurality of groups of computing nodes, each group comprising a set of computing nodes A-D, a set of intra-group interconnects that communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C, and a set of inter-group interconnects that communicatively couple a computing node A of a first group of the plurality of groups with a computing node A of a second group neighboring the first group and a computing node A of a third group neighboring the first group, a computing node B of the first group with a computing node B of the second group and a computing node B of the third group, a computing node C of the first group with a computing node C of the second group and a computing node C of the third group, and a computing node D of the first group with a computing node D of the second group and a computing node D of the third group, wherein the second group and the third group are aligned with the first group in different dimensions, the method comprising: syncing data across a first dimension of computing nodes of the first group and the second group using a first set of ring connections, wherein the first set of ring connections are formed using inter-group and intra-group interconnects that communicatively couple the computing nodes of the first group and the second group along the first dimension, and the syncing data across the first dimension comprises transferring, in a unit time, sub-data from a computing node along the first dimension to another computing node in a unit time via a connection on the first set of ring connections; syncing data across a second dimension of computing nodes of the first group and the third group using a second set of ring connections, wherein the second set of ring connections are formed using inter-group and intra-group interconnects that communicatively couple the computing nodes of the first group and the third group along the second dimension, and syncing data across the second dimension comprises transferring, in a unit time, sub-data from a computing node along the second dimension to another computing node in a unit time via a connection on the second set of ring connections; and broadcasting synced data across the first dimension or the second dimension of computing nodes using the first set of ring connections or the second set of ring connections. 2. The method of claim 1 , wherein the data to be synced comprises a plurality of sub-data, and each computing node comprises a different version of each sub-data. 3. The method of claim 2 , wherein: syncing data across a first dimension of computing nodes using a first set of ring connections further comprises: in a clock cycle, receiving a version of sub-data by each computing node in a row, wherein the version of the sub-data is transferred from another computing node in the row via the connection on the first set of ring connections; and continuing data transferring until each computing node along the first dimension receives all versions of a sub-data from all computing nodes in the row. 4. The method of claim 3 , wherein: syncing data across a second dimension of computing nodes using a second set of ring connections further comprises: in a clock cycle, receiving a version of sub-data by each computing node in a column, wherein the version of the sub-data is transferred from another computing node in the column via the connection on the second set of ring connections; and continuing data transferring until each computing node along the second dimension receives all versions of a sub-data from all computing nodes. 5. The method claim 3 , wherein broadcasting synced data across the first dimension or the second dimension of computing nodes using the first set of ring connections or the second set of ring connections further comprises: in a clock cycle, receiving sub-data by each computing node across the first dimension or the second dimension, wherein the sub-data transferred from another computing node across the first dimension or the second dimension via the connection on the first set of ring connections or the connection on the second set of ring connections; and continuing data transferring until all computing nodes along the first dimension or the second dimension receive sub-data from all computing nodes. 6. The method of claim 1 , wherein the set of intra-group interconnects and the set of inter-group interconnects comprise inter-chip interconnects. 7. The method of claim 1 , wherein the computing nodes are artificial intelligence (“AI”) training processors, AI training chips, neural processing units (“NPU”), or graphic processing units (“GPU”). 8. The method of claim 7 , wherein the computing task is an AI computing task involving an allreduce algorithm. 9. A system for syncing data of a computing task across a plurality of groups of computing nodes, each group comprising a set of computing nodes A-D, a set of intra-group interconnects that communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C, and a set of inter-group interconnects that communicatively couple a computing node A of a first group of the plurality of groups with a computing node A of a second group neighboring the first group, and a computing node A of a third group neighboring the first group a computing node B of the first group with a computing node B of the second group and a computing node B of the third group, a computing node C of the first group with a computing node C of the second group and a computing node C of the third group, and a computing node D of the first group with a computing node D of the second group and a computing node D of the third group, wherein the second group and the third group are aligned with the first group in different dimensions, the system comprising: a memory storing a set of instructions; and one or more processors configured to execute the set of instructions to cause the system to: sync data across a first dimension of computing nodes of the first group and the second group using a first set of ring connections, wherein the first set of ring connections are formed using inter-group and intra-group interconnects that communicatively couple the computing nodes of the first group and the second group along the first dimension, and the syncing data across the first dimension comprises transferring, in a unit time, sub-data from a computing node along the first dimension to another computing node in a unit time via a connection on the first set of ring connections; sync data across a second dimension of computing nodes of the first group and the third group using a second set of ring connections, wherein the second set of ring connections are formed using inter-group and intra-group interconnects that communicatively couple the computing nodes of the first group and the third group along the second dimension, and syncing data across the second dimension comprises transferring, in a unit time, sub-data from a computing node along the second dimension to another computing node in a unit time via a connection on the second set of ring connections; and broadcast synced data across the first dimension or the second dimension of computing nodes using the first set of ring connections or the second set of ring connections. 10. The system of claim 9 , wherein the data to be synced comprises a plurality of sub-data, and each computing node comprises a different version of each sub-data. 11. The system of claim 10 , wherein the one or more processors are further configured to execute the set of instr

Assignees

Alibaba Group Holding Ltd

Inventors

Classifications

G06F9/52Primary
Program synchronisation; Mutual exclusion, e.g. by means of semaphores · CPC title
G06F9/5038Primary
considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration (scheduling strategies G06F9/4881 and subgroups) · CPC title
G06N3/063
using electronic means · CPC title
G06F9/542
Event management; Broadcasting; Multicasting; Notifications · CPC title
G06F9/4881
Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

Patent family

Related publications grouped by family.

View patent family 77025318

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11520640B2 cover?: The present disclosure provides a method for syncing data of a computing task across a plurality of groups of computing nodes, each group comprising a set of computing nodes A-D, a set of intra-group interconnects that communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C, and a set of inter-group interconnects that communicativel…
Who is the assignee on this patent?: Alibaba Group Holding Ltd
What technology area does this patent fall under?: Primary CPC classification G06F9/52. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 06 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Distributed random binning featurization with hybrid two-level parallelism

System and method for supporting parallel asynchronous synchronization between clusters in a distributed data grid

Database Query Processing Using Horizontal Data Record Alignment of Multi-Column Range Summaries

Parallel computer system, method of controlling parallel computer system, and recording medium

Frequently asked questions