Processing computational graphs
US-10534997-B2 · Jan 14, 2020 · US
US11620502B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11620502-B2 |
| Application number | US-202016777731-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 30, 2020 |
| Priority date | Jan 30, 2020 |
| Publication date | Apr 4, 2023 |
| Grant date | Apr 4, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure provides a method for syncing data of a computing task across a plurality of groups of computing nodes. Each group including a set of computing nodes A-D, a set of intra-group interconnects that communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C, and a set of inter-group interconnects that communicatively couple each of computing nodes A-D with corresponding computing nodes A-D in each of a plurality of neighboring groups. The method comprises syncing data at a computing node of the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node; and broadcasting synced data from the node to the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node.
Opening claim text (preview).
What is claimed is: 1. A method for syncing data of a computing task across a plurality of groups of computing nodes, each group including a set of computing nodes A-D, a set of intra-group interconnects that communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C, and a set of inter-group interconnects that communicatively couple a computing node A of a first group with a computing node A of a second group neighboring the first group, a computing node B of the first group with a computing node B of the second group, a computing node C of the first group with a computing node C of the second group, and a computing node D of the first group with a computing node D of the second group, the method comprising: syncing data at a computing node of the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node, wherein the four directions for syncing data comprises two horizontal-based directions and two vertical-based directions and two directions of the four different directions involve intra-group interconnects and two other directions of the four different directions involve inter-group interconnects; and broadcasting synced data from the node to the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node, wherein the four directions for broadcasting data comprises two horizontal-based directions and two vertical-based directions and two directions of the four different directions involve intra-group interconnects and two other directions of the four different directions involve inter-group interconnects. 2. The method of claim 1 , wherein syncing data at the computing node of the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects comprises: for each group of computing nodes, syncing data across a plurality of computing nodes of a group to reduce data into one computing node of the group using intra-group interconnects. 3. The method of claim 1 , wherein broadcasting synced data from the node to the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects comprises: broadcasting, using inter-group interconnects, to a computing node of a first group of the plurality of groups synced data from other groups of the plurality of groups of computing nodes. 4. The method of claim 1 , wherein broadcasting synced data from the node to the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects comprises: broadcasting, using inter-group interconnects, synced data to one computing node in each area of computing nodes, wherein an area of computing nodes comprises four groups of computing nodes. 5. The method of claim 1 , wherein the data to be synced comprises a plurality of sub-data, and each computing node comprises a different version of each sub-data. 6. The method of claim 5 , wherein syncing data at a computing node of the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node comprises: in a clock cycle, receiving a version of sub-data by a computing node along one direction of the four different directions, the version of sub-data is transferred from another computing node at another end of a connection along the direction. 7. The method of claim 5 , wherein broadcasting synced data from the node to the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node comprises: in a clock cycle, receiving sub-data by a computing node along one direction of the four different directions, the sub-data is broadcasted from another computing node at another end of a connection along the direction. 8. The method of claim 1 , wherein the set of intra-group interconnects and the set of inter-group interconnects comprise inter-chip interconnects that are bi-directional. 9. The method of claim 1 , wherein: syncing data at a computing node of the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node comprises syncing data concurrently in connections along one or more of the four different directions; and broadcasting synced data from the node to the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node comprises broadcasting data concurrently in connections along one or more of the four different directions. 10. The method of claim 1 , wherein the computing nodes are artificial intelligence (“AI”) training processors, AI training chips, neural processing units (“NPU”), or graphic processing units (“GPU”). 11. The method of claim 10 , wherein the computing task is an AI computing task involving an allreduce algorithm. 12. A system for syncing data of a computing task across a plurality of groups of computing nodes, each group comprising a set of computing nodes A-D, a set of intra-group interconnects that communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C, and a set of inter-group interconnects that communicatively couple a computing node A of a first group with a computing node A of a second group neighboring the first group, a computing node B of the first group with a computing node B of the second group, a computing node C of the first group with a computing node C of the second group, and a computing node D of the first group with a computing node D of the second group, the system comprising: a memory storing a set of instructions; and one or more processors configured to execute the set of instructions to cause the system to: sync data at a computing node of the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node, wherein the four directions for syncing data comprises two horizontal-based directions and two vertical-based directions and two directions of the four different directions involve intra-group interconnects and two other directions of the four different directions involve inter-group interconnects; and broadcast synced data from the node to the plurality of groups of computing nodes using inter-group interconnects and intra-group interconnects along four different directions relative to the node, wherein the four directions for broadcasting data comprises two horizontal-based directions and two vertical-based directions and two directions of the four different directions involve intra-group interconnects and two other directions of the four different directions involve inter-group interconnects. 13. The system of claim 12 , wherein the one or more processors are further configured to execute the set of instructions to cause the system to: for each group of computing nodes, sync data across a plurality of computing nodes of a group to reduce data into one computing node of the group using intra-group interconnects. 14. The system of claim 12 , wherein the one or more processors are further configured to execute the set of instructions to cause the system to: broadcast, using inter-group interconnects, to a computing node of a first group of the plurality of groups synced data from other groups of th
Related publications grouped by family.
Answers are generated from the same data shown on this page.