What technology area does this patent fall under?

Primary CPC classification G06N3/08. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 04 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Hyper-square implementation of tree AllReduce algorithm for distributed parallel deep learning

US11620502B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11620502-B2
Application number	US-202016777731-A
Country	US
Kind code	B2
Filing date	Jan 30, 2020
Priority date	Jan 30, 2020
Publication date	Apr 4, 2023
Grant date	Apr 4, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure provides a method for syncing data of a computing task across a plurality of groups of computing nodes. Each group including a set of computing nodes A-D, a set of intra-group interconnects that communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C, and a set of inter-group interconnects that communicatively couple each of computing nodes A-D with corresponding computing nodes A-D in each of a plurality of neighboring groups. The method comprises syncing data at a computing node of the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node; and broadcasting synced data from the node to the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for syncing data of a computing task across a plurality of groups of computing nodes, each group including a set of computing nodes A-D, a set of intra-group interconnects that communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C, and a set of inter-group interconnects that communicatively couple a computing node A of a first group with a computing node A of a second group neighboring the first group, a computing node B of the first group with a computing node B of the second group, a computing node C of the first group with a computing node C of the second group, and a computing node D of the first group with a computing node D of the second group, the method comprising: syncing data at a computing node of the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node, wherein the four directions for syncing data comprises two horizontal-based directions and two vertical-based directions and two directions of the four different directions involve intra-group interconnects and two other directions of the four different directions involve inter-group interconnects; and broadcasting synced data from the node to the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node, wherein the four directions for broadcasting data comprises two horizontal-based directions and two vertical-based directions and two directions of the four different directions involve intra-group interconnects and two other directions of the four different directions involve inter-group interconnects. 2. The method of claim 1 , wherein syncing data at the computing node of the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects comprises: for each group of computing nodes, syncing data across a plurality of computing nodes of a group to reduce data into one computing node of the group using intra-group interconnects. 3. The method of claim 1 , wherein broadcasting synced data from the node to the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects comprises: broadcasting, using inter-group interconnects, to a computing node of a first group of the plurality of groups synced data from other groups of the plurality of groups of computing nodes. 4. The method of claim 1 , wherein broadcasting synced data from the node to the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects comprises: broadcasting, using inter-group interconnects, synced data to one computing node in each area of computing nodes, wherein an area of computing nodes comprises four groups of computing nodes. 5. The method of claim 1 , wherein the data to be synced comprises a plurality of sub-data, and each computing node comprises a different version of each sub-data. 6. The method of claim 5 , wherein syncing data at a computing node of the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node comprises: in a clock cycle, receiving a version of sub-data by a computing node along one direction of the four different directions, the version of sub-data is transferred from another computing node at another end of a connection along the direction. 7. The method of claim 5 , wherein broadcasting synced data from the node to the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node comprises: in a clock cycle, receiving sub-data by a computing node along one direction of the four different directions, the sub-data is broadcasted from another computing node at another end of a connection along the direction. 8. The method of claim 1 , wherein the set of intra-group interconnects and the set of inter-group interconnects comprise inter-chip interconnects that are bi-directional. 9. The method of claim 1 , wherein: syncing data at a computing node of the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node comprises syncing data concurrently in connections along one or more of the four different directions; and broadcasting synced data from the node to the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node comprises broadcasting data concurrently in connections along one or more of the four different directions. 10. The method of claim 1 , wherein the computing nodes are artificial intelligence (“AI”) training processors, AI training chips, neural processing units (“NPU”), or graphic processing units (“GPU”). 11. The method of claim 10 , wherein the computing task is an AI computing task involving an allreduce algorithm. 12. A system for syncing data of a computing task across a plurality of groups of computing nodes, each group comprising a set of computing nodes A-D, a set of intra-group interconnects that communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C, and a set of inter-group interconnects that communicatively couple a computing node A of a first group with a computing node A of a second group neighboring the first group, a computing node B of the first group with a computing node B of the second group, a computing node C of the first group with a computing node C of the second group, and a computing node D of the first group with a computing node D of the second group, the system comprising: a memory storing a set of instructions; and one or more processors configured to execute the set of instructions to cause the system to: sync data at a computing node of the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node, wherein the four directions for syncing data comprises two horizontal-based directions and two vertical-based directions and two directions of the four different directions involve intra-group interconnects and two other directions of the four different directions involve inter-group interconnects; and broadcast synced data from the node to the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node, wherein the four directions for broadcasting data comprises two horizontal-based directions and two vertical-based directions and two directions of the four different directions involve intra-group interconnects and two other directions of the four different directions involve inter-group interconnects. 13. The system of claim 12 , wherein the one or more processors are further configured to execute the set of instructions to cause the system to: for each group of computing nodes, sync data across a plurality of computing nodes of a group to reduce data into one computing node of the group using intra-group interconnects. 14. The system of claim 12 , wherein the one or more processors are further configured to execute the set of instructions to cause the system to: broadcast, using inter-group interconnects, to a computing node of a first group of the plurality of groups synced data from other groups of th

Assignees

Alibaba Group Holding Ltd

Inventors

Classifications

G06N3/098
Distributed learning, e.g. federated learning · CPC title
G06N3/08Primary
Learning methods · CPC title
G06N3/063Primary
using electronic means · CPC title
G06F9/542
Event management; Broadcasting; Multicasting; Notifications · CPC title
G06N3/04
Architecture, e.g. interconnection topology · CPC title

Patent family

Related publications grouped by family.

View patent family 77228134

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11620502B2 cover?: The present disclosure provides a method for syncing data of a computing task across a plurality of groups of computing nodes. Each group including a set of computing nodes A-D, a set of intra-group interconnects that communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C, and a set of inter-group interconnects that communicatively…
Who is the assignee on this patent?: Alibaba Group Holding Ltd
What technology area does this patent fall under?: Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 04 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Processing computational graphs

Topology-aware provisioning of hardware accelerator resources in a distributed environment

Methods and apparatus for identifying the shared importance of multiple nodes within a machine learning model for multiple tasks

Clustered storage system synchronization

Frequently asked questions