Distributed artificial intelligence extension modules for network switches
US-11057318-B1 · Jul 6, 2021 · US
US11580388B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11580388-B2 |
| Application number | US-202016734092-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 3, 2020 |
| Priority date | Jan 3, 2020 |
| Publication date | Feb 14, 2023 |
| Grant date | Feb 14, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of the present disclosure include techniques for processing neural networks. Various forms of parallelism may be implemented using topology that combines sequences of processors. In one embodiment, the present disclosure includes a computer system comprising a plurality of processor groups, the processor groups each comprising a plurality of processors. A plurality of network switches are coupled to subsets of the plurality of processor groups. A subset of the processors in the processor groups may be configurable to form sequences, and the network switches are configurable to form at least one sequence across one or more of the plurality of processor groups to perform neural network computations. Various alternative configurations for creating Hamiltonian cycles are disclosed to support data parallelism, pipeline parallelism, layer parallelism, or combinations thereof.
Opening claim text (preview).
What is claimed is: 1. A computer system comprising: a plurality of processor groups, the processor groups comprising a plurality of series configured processors to process a partitioned neural network, wherein different processors in the plurality of processor groups process one or more different layers, stages, or model instances of the partitioned neural network; and a plurality of network switches coupled to subsets of the plurality of processor groups through edge processors of the series configured processors in each processor group, wherein at least a subset of the processors in the processor groups are configured to form sequences such that each processor communicates data for said layers, stages, or model instances with at least two adjacent processor in the sequence, and wherein the network switches are configurable to form at least one sequence across one or more of the plurality of processor groups to perform neural network computations. 2. The computer system of claim 1 wherein, during performance of the neural network computations, at least a first subset of processors in a first processor group and at least a second subset of processors in at least a second processor group are coupled together to form a sequence comprising a string of processors, wherein each processor communicates with an adjacent processor in the string of processors. 3. The computer system of claim 1 wherein one or more processor groups comprise processors configured in series to form a 1-dimensional processor array. 4. The computer system of claim 1 wherein one or more processor groups comprise processors configured as rows and columns to form a 2-dimensional processor array. 5. The computer system of claim 1 wherein the plurality of processors in the processor groups are configured in an N-dimensional processor array. 6. The computer system of claim 1 wherein: edge processor ports on opposite sides of a row of processors across a first plurality of processor groups are coupled to a first network switch; and edge processor ports on opposite sides of a column of processors across a second plurality of processor groups are coupled to second network switch. 7. The computer system of claim 1 wherein the processor groups are coupled to a plurality of row network switches and a plurality of column network switches across a plurality of switching planes corresponding to rows and columns of processors in each processor group. 8. The computer system of claim 7 wherein one or more of the row network switches are coupled to one or more of the column network switches. 9. The computer system of claim 1 wherein the processor groups are configured in a multi-dimensional cluster array. 10. The computer system of claim 9 wherein processor groups along a particular dimension of the cluster array have edge processor ports coupled to corresponding same dimension network switches. 11. The computer system of claim 10 wherein a plurality of network switches along a particular dimension are coupled together through a plurality of intermediate network switches. 12. The computer system of claim 9 wherein: rows of processor groups in the cluster array have edge processor ports coupled to corresponding same row network switches, and columns of processor groups in the cluster array have edge processor ports coupled to corresponding same column network switches. 13. The computer system of claim 12 wherein: row network switches are coupled together through one or more intermediate row network switches, and column network switches are coupled together through one or more intermediate column network switches. 14. The computer system of claim 12 wherein one or more intermediate row network switches are coupled to one or more intermediate column network switches. 15. The computer system of claim 1 wherein the plurality of network switches are directly coupled together. 16. The computer system of claim 1 wherein the plurality of network switches are coupled to one or more intermediate network switches. 17. The computer system of claim 16 wherein the plurality of network switches and the one or more intermediate network switches form a two-tier switching network. 18. The computer system of claim 16 wherein each of the plurality of network switches are coupled to a plurality of intermediate network switches to couple processors in the subsets of the plurality of processor groups in series. 19. The computer system of claim 16 wherein processors along a first dimension of the plurality of processor groups are coupled to the plurality of the network switches. 20. The computer system of claim 19 wherein the first dimension is a row of processors. 21. The computer system of claim 19 wherein the first dimension is a column of processors. 22. The computer system of claim 1 wherein the neural network computations are computations for training a neural network. 23. The computer system of claim 1 wherein processors configured to form the sequence are configured to process a plurality of partial layers of the partitioned neural network. 24. The computer system of claim 1 wherein processors configured to form the sequence are configured to process one or more pipeline stages of the partitioned neural network. 25. The computer system of claim 1 wherein processors configured to form the sequence are configured to adjust weights across a plurality of instances of the neural network. 26. A computer system comprising: a plurality of processor groups, the processor groups comprising a plurality of series configured processors to process a partitioned neural network, wherein different processors in the plurality of processor groups process one or more different layers, stages, or model instances of the partitioned neural network; a plurality of network switches, wherein the plurality of the network switches are coupled to subsets of the plurality of processor groups through edge processors of the series configured processors in each processor group; and a plurality of intermediate network switches coupled to subsets of the plurality of network switches, wherein at least a subset of the processors in the processor groups, one or more of the plurality of network switches, and one or more of the intermediate network switches are configured in sequences such that each processor communicates data for said layers, stages, or model instances with at least two adjacent processor in the sequence to perform Hamiltonian cycles for one or more of: data parallelism neural network computations, pipeline parallelism neural network computations, and layer parallelism neural network computations. 27. A method for processing a neural network, the method comprising: configuring a plurality of processors arranged in a plurality of processor groups to perform neural network computations on a partitioned neural network, wherein at least a subset of the processors in the processor groups are configured in series to form sequences of processors, and wherein different processors in the plurality of processor groups process one or more different layers, stages, or model instances of the partitioned neural network; configuring a plurality of network switches to coupled subsets of the plurality of processor groups together through edge processors of the series configured processors in each processor group to form at least one sequence of pr
Supervised learning · CPC title
Feedforward networks · CPC title
Distributed learning, e.g. federated learning · CPC title
using electronic means · CPC title
Three dimensional, e.g. hypercubes · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.