Hyper-square implementation of tree AllReduce algorithm for distributed parallel deep learning

US11620502B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11620502-B2
Application numberUS-202016777731-A
CountryUS
Kind codeB2
Filing dateJan 30, 2020
Priority dateJan 30, 2020
Publication dateApr 4, 2023
Grant dateApr 4, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure provides a method for syncing data of a computing task across a plurality of groups of computing nodes. Each group including a set of computing nodes A-D, a set of intra-group interconnects that communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C, and a set of inter-group interconnects that communicatively couple each of computing nodes A-D with corresponding computing nodes A-D in each of a plurality of neighboring groups. The method comprises syncing data at a computing node of the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node; and broadcasting synced data from the node to the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for syncing data of a computing task across a plurality of groups of computing nodes, each group including a set of computing nodes A-D, a set of intra-group interconnects that communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C, and a set of inter-group interconnects that communicatively couple a computing node A of a first group with a computing node A of a second group neighboring the first group, a computing node B of the first group with a computing node B of the second group, a computing node C of the first group with a computing node C of the second group, and a computing node D of the first group with a computing node D of the second group, the method comprising: syncing data at a computing node of the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node, wherein the four directions for syncing data comprises two horizontal-based directions and two vertical-based directions and two directions of the four different directions involve intra-group interconnects and two other directions of the four different directions involve inter-group interconnects; and broadcasting synced data from the node to the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node, wherein the four directions for broadcasting data comprises two horizontal-based directions and two vertical-based directions and two directions of the four different directions involve intra-group interconnects and two other directions of the four different directions involve inter-group interconnects. 2. The method of claim 1 , wherein syncing data at the computing node of the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects comprises: for each group of computing nodes, syncing data across a plurality of computing nodes of a group to reduce data into one computing node of the group using intra-group interconnects. 3. The method of claim 1 , wherein broadcasting synced data from the node to the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects comprises: broadcasting, using inter-group interconnects, to a computing node of a first group of the plurality of groups synced data from other groups of the plurality of groups of computing nodes. 4. The method of claim 1 , wherein broadcasting synced data from the node to the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects comprises: broadcasting, using inter-group interconnects, synced data to one computing node in each area of computing nodes, wherein an area of computing nodes comprises four groups of computing nodes. 5. The method of claim 1 , wherein the data to be synced comprises a plurality of sub-data, and each computing node comprises a different version of each sub-data. 6. The method of claim 5 , wherein syncing data at a computing node of the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node comprises: in a clock cycle, receiving a version of sub-data by a computing node along one direction of the four different directions, the version of sub-data is transferred from another computing node at another end of a connection along the direction. 7. The method of claim 5 , wherein broadcasting synced data from the node to the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node comprises: in a clock cycle, receiving sub-data by a computing node along one direction of the four different directions, the sub-data is broadcasted from another computing node at another end of a connection along the direction. 8. The method of claim 1 , wherein the set of intra-group interconnects and the set of inter-group interconnects comprise inter-chip interconnects that are bi-directional. 9. The method of claim 1 , wherein: syncing data at a computing node of the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node comprises syncing data concurrently in connections along one or more of the four different directions; and broadcasting synced data from the node to the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node comprises broadcasting data concurrently in connections along one or more of the four different directions. 10. The method of claim 1 , wherein the computing nodes are artificial intelligence (“AI”) training processors, AI training chips, neural processing units (“NPU”), or graphic processing units (“GPU”). 11. The method of claim 10 , wherein the computing task is an AI computing task involving an allreduce algorithm. 12. A system for syncing data of a computing task across a plurality of groups of computing nodes, each group comprising a set of computing nodes A-D, a set of intra-group interconnects that communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C, and a set of inter-group interconnects that communicatively couple a computing node A of a first group with a computing node A of a second group neighboring the first group, a computing node B of the first group with a computing node B of the second group, a computing node C of the first group with a computing node C of the second group, and a computing node D of the first group with a computing node D of the second group, the system comprising: a memory storing a set of instructions; and one or more processors configured to execute the set of instructions to cause the system to: sync data at a computing node of the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node, wherein the four directions for syncing data comprises two horizontal-based directions and two vertical-based directions and two directions of the four different directions involve intra-group interconnects and two other directions of the four different directions involve inter-group interconnects; and broadcast synced data from the node to the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node, wherein the four directions for broadcasting data comprises two horizontal-based directions and two vertical-based directions and two directions of the four different directions involve intra-group interconnects and two other directions of the four different directions involve inter-group interconnects. 13. The system of claim 12 , wherein the one or more processors are further configured to execute the set of instructions to cause the system to: for each group of computing nodes, sync data across a plurality of computing nodes of a group to reduce data into one computing node of the group using intra-group interconnects. 14. The system of claim 12 , wherein the one or more processors are further configured to execute the set of instructions to cause the system to: broadcast, using inter-group interconnects, to a computing node of a first group of the plurality of groups synced data from other groups of th

Assignees

Inventors

Classifications

  • Distributed learning, e.g. federated learning · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

  • Event management; Broadcasting; Multicasting; Notifications · CPC title

  • Architecture, e.g. interconnection topology · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11620502B2 cover?
The present disclosure provides a method for syncing data of a computing task across a plurality of groups of computing nodes. Each group including a set of computing nodes A-D, a set of intra-group interconnects that communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C, and a set of inter-group interconnects that communicatively…
Who is the assignee on this patent?
Alibaba Group Holding Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 04 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).