Distributed random binning featurization with hybrid two-level parallelism
US-11080228-B2 · Aug 3, 2021 · US
US11520640B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11520640-B2 |
| Application number | US-202016777711-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 30, 2020 |
| Priority date | Jan 30, 2020 |
| Publication date | Dec 6, 2022 |
| Grant date | Dec 6, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure provides a method for syncing data of a computing task across a plurality of groups of computing nodes, each group comprising a set of computing nodes A-D, a set of intra-group interconnects that communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C, and a set of inter-group interconnects that communicatively couple a computing node A of a first group of the plurality of groups with a computing node A of a second group neighboring the first group, a computing node B of the first group with a computing node B of the second group, a computing node C of the first group with the computing node C of the second group, and a computing node D of the first group with a computing node D of the second group, the method comprising: syncing across a first dimension of computing nodes using a first set of ring connections, wherein the first set of ring connections are formed using inter-group and intra-group interconnects that communicatively couple the computing nodes along the first dimension; and broadcasting synced data across a second dimension of computing nodes using a second ring connection.
Opening claim text (preview).
What is claimed is: 1. A method for syncing data of a computing task across a plurality of groups of computing nodes, each group comprising a set of computing nodes A-D, a set of intra-group interconnects that communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C, and a set of inter-group interconnects that communicatively couple a computing node A of a first group of the plurality of groups with a computing node A of a second group neighboring the first group and a computing node A of a third group neighboring the first group, a computing node B of the first group with a computing node B of the second group and a computing node B of the third group, a computing node C of the first group with a computing node C of the second group and a computing node C of the third group, and a computing node D of the first group with a computing node D of the second group and a computing node D of the third group, wherein the second group and the third group are aligned with the first group in different dimensions, the method comprising: syncing data across a first dimension of computing nodes of the first group and the second group using a first set of ring connections, wherein the first set of ring connections are formed using inter-group and intra-group interconnects that communicatively couple the computing nodes of the first group and the second group along the first dimension, and the syncing data across the first dimension comprises transferring, in a unit time, sub-data from a computing node along the first dimension to another computing node in a unit time via a connection on the first set of ring connections; syncing data across a second dimension of computing nodes of the first group and the third group using a second set of ring connections, wherein the second set of ring connections are formed using inter-group and intra-group interconnects that communicatively couple the computing nodes of the first group and the third group along the second dimension, and syncing data across the second dimension comprises transferring, in a unit time, sub-data from a computing node along the second dimension to another computing node in a unit time via a connection on the second set of ring connections; and broadcasting synced data across the first dimension or the second dimension of computing nodes using the first set of ring connections or the second set of ring connections. 2. The method of claim 1 , wherein the data to be synced comprises a plurality of sub-data, and each computing node comprises a different version of each sub-data. 3. The method of claim 2 , wherein: syncing data across a first dimension of computing nodes using a first set of ring connections further comprises: in a clock cycle, receiving a version of sub-data by each computing node in a row, wherein the version of the sub-data is transferred from another computing node in the row via the connection on the first set of ring connections; and continuing data transferring until each computing node along the first dimension receives all versions of a sub-data from all computing nodes in the row. 4. The method of claim 3 , wherein: syncing data across a second dimension of computing nodes using a second set of ring connections further comprises: in a clock cycle, receiving a version of sub-data by each computing node in a column, wherein the version of the sub-data is transferred from another computing node in the column via the connection on the second set of ring connections; and continuing data transferring until each computing node along the second dimension receives all versions of a sub-data from all computing nodes. 5. The method claim 3 , wherein broadcasting synced data across the first dimension or the second dimension of computing nodes using the first set of ring connections or the second set of ring connections further comprises: in a clock cycle, receiving sub-data by each computing node across the first dimension or the second dimension, wherein the sub-data transferred from another computing node across the first dimension or the second dimension via the connection on the first set of ring connections or the connection on the second set of ring connections; and continuing data transferring until all computing nodes along the first dimension or the second dimension receive sub-data from all computing nodes. 6. The method of claim 1 , wherein the set of intra-group interconnects and the set of inter-group interconnects comprise inter-chip interconnects. 7. The method of claim 1 , wherein the computing nodes are artificial intelligence (“AI”) training processors, AI training chips, neural processing units (“NPU”), or graphic processing units (“GPU”). 8. The method of claim 7 , wherein the computing task is an AI computing task involving an allreduce algorithm. 9. A system for syncing data of a computing task across a plurality of groups of computing nodes, each group comprising a set of computing nodes A-D, a set of intra-group interconnects that communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C, and a set of inter-group interconnects that communicatively couple a computing node A of a first group of the plurality of groups with a computing node A of a second group neighboring the first group, and a computing node A of a third group neighboring the first group a computing node B of the first group with a computing node B of the second group and a computing node B of the third group, a computing node C of the first group with a computing node C of the second group and a computing node C of the third group, and a computing node D of the first group with a computing node D of the second group and a computing node D of the third group, wherein the second group and the third group are aligned with the first group in different dimensions, the system comprising: a memory storing a set of instructions; and one or more processors configured to execute the set of instructions to cause the system to: sync data across a first dimension of computing nodes of the first group and the second group using a first set of ring connections, wherein the first set of ring connections are formed using inter-group and intra-group interconnects that communicatively couple the computing nodes of the first group and the second group along the first dimension, and the syncing data across the first dimension comprises transferring, in a unit time, sub-data from a computing node along the first dimension to another computing node in a unit time via a connection on the first set of ring connections; sync data across a second dimension of computing nodes of the first group and the third group using a second set of ring connections, wherein the second set of ring connections are formed using inter-group and intra-group interconnects that communicatively couple the computing nodes of the first group and the third group along the second dimension, and syncing data across the second dimension comprises transferring, in a unit time, sub-data from a computing node along the second dimension to another computing node in a unit time via a connection on the second set of ring connections; and broadcast synced data across the first dimension or the second dimension of computing nodes using the first set of ring connections or the second set of ring connections. 10. The system of claim 9 , wherein the data to be synced comprises a plurality of sub-data, and each computing node comprises a different version of each sub-data. 11. The system of claim 10 , wherein the one or more processors are further configured to execute the set of instr
Program synchronisation; Mutual exclusion, e.g. by means of semaphores · CPC title
considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration (scheduling strategies G06F9/4881 and subgroups) · CPC title
using electronic means · CPC title
Event management; Broadcasting; Multicasting; Notifications · CPC title
Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.