Network adapter with a common queue for both networking and data manipulation work requests
US-2019171612-A1 · Jun 6, 2019 · US
US11237880B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-11237880-B1 |
| Application number | US-202117379924-A |
| Country | US |
| Kind code | B1 |
| Filing date | Jul 19, 2021 |
| Priority date | Dec 18, 2020 |
| Publication date | Feb 1, 2022 |
| Grant date | Feb 1, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Roughly described, a system for data parallel training of a neural network on multiple reconfigurable units configured by a host with dataflow pipelines to perform different steps in the training CGRA units are configured to evaluate first and second sequential sections of neural network layers based on a respective subset of training data, and to back-propagate the error through the sections to calculate parameter gradients for the respective subset. Gradient synchronization and reduction are performed by one or more units having finer grain reconfigurability, such as an FPGA. The FPGA performs synchronization and reduction of the gradients for the second section while the CGRA units perform back-propagation through the first sequential section. Intermediate results are transmitted using a P2P message passing protocol layer. Execution of dataflow segments in the different units is triggered by receipt of data, rather than by a command from any host system.
Opening claim text (preview).
What is claimed is: 1. A system for training parameters of a neural network using training data samples partitioned across a plurality of participating reconfigurable processors, each of the data samples including a plurality of input values and a set of at least one target output value, comprising: a plurality of N processing nodes, N>1, each k'th one of processing nodes in the plurality of N processing nodes including: a respective plurality of Mk first reconfigurable processors RP k.0 . . . RP k.Mk−1 , each reconfigurable processor in the respective plurality of Mk first reconfigurable processors RP k.0 . . . RP k.Mk−1 being reconfigurable at a first level of configuration granularity, a respective k'th reconfigurable master network interface controller RU k , k=0, . . . , N−1, each master network interface controller in the respective k'th reconfigurable master network interface controller RU k , k=0, . . . , N−1 being reconfigurable at a second level of configuration granularity finer than the first level of configuration granularity, and a network interconnecting master network interface controllers in the respective k'th reconfigurable master network interface controller RU k , k=0, . . . , N−1; wherein each reconfigurable processor is configured by a host processor with one or more dataflow pipelines implementing a respective instance of a first dataflow segment, the first dataflow segment including, for a respective current subset of training data samples and a current set of neural network parameters, pipeline stages that: evaluate a neural network using input values of the respective current subset of training data samples and parameter values of the current set of neural network parameters, to calculate a set of at least one predicted output value, calculate a loss in dependence upon the predicted output value and a target output value in a set of at least one target output value, calculate a first intermediate result in dependence upon a parameter gradient with respect to each of at least a subset of neural network parameters in the current set of neural network parameters, and forward the first intermediate result to the respective k'th reconfigurable master network interface controller RU k without passing through the host processor; wherein each of the master network interface controllers is configured by the host processor with one or more dataflow pipelines implementing a respective instance of a second dataflow segment, the second dataflow segment including pipeline stages that: participate in an aggregation of first intermediate results from the first reconfigurable processors RP k.0 . . . RP k.Mk−1 , to calculate a second intermediate result in dependence upon the first intermediate results, communicate via a peer-to-peer (P2P) protocol layer over the network and without passing through the host processor, to participate in an aggregation of second intermediate results from a second reconfigurable units RU k , k=0, . . . , N−1, to calculate a third intermediate result in dependence upon the second intermediate results, the third intermediate result indicative of an update of the current set of neural network parameters, and forward the third intermediate result to the first reconfigurable processors RP k.i , i=0, . . . , M k−1 , without passing through the host processor, for use in a subsequent evaluation of the neural network using a subsequent subset of training data samples; and wherein each of the reconfigurable master network interface controllers RU k is configured by the host processor with a respective local gradient buffer having N gradient buffer segments, wherein the pipeline stages in the second dataflow segment configured into each of the reconfigurable master network interface controllers RU k , for communicating over the network to participate in the aggregation of the second intermediate results from the second reconfigurable units RU k , k=0, . . . , N−1, includes pipeline stages implementing an accumulation phase of the aggregation followed by a distribution phase of the aggregation. 2. The system of claim 1 , wherein the neural network includes a plurality of neural network sections including a section S 2 being an output neural network section having outputs for carrying the at least one predicted output value, and further including a section S 1 having inputs and further having outputs providing values to inputs of the section S 2 , and wherein the first intermediate result includes values dependent upon gradients of only the neural network parameters in the section S 2 , the third intermediate result being indicative of an update of only the neural network parameters in the section S 2 . 3. The system of claim 2 , wherein the second dataflow segment configured into each of the reconfigurable master network interface controllers RU k includes pipeline stages that, in response to receipt by a reconfigurable master network interface controller RU k of the first intermediate results from a (k,i)'th one of the RPs, sends a P2P message to the (k,i)'th RP indicating such receipt, wherein the second dataflow segment configured into each of the reconfigurable master network interface controllers RU k further includes pipeline stages that, in response to receipt by the reconfigurable master network interface controller RU k of the first intermediate results from all of RP k.i , i=0, . . . , M k−1 , initiate the aggregation of the second intermediate results to calculate the third intermediate result indicative of the update of the neural network parameters in the section S 2 , and wherein the first dataflow segment configured into a (k,i)'th one of the RPs further includes pipeline stages that, in response to the P2P message from the reconfigurable master network interface controller RU k indicating receipt from the (k,i)'th one of the RPs of the first intermediate result for the section S 2 : calculate an S 1 section first intermediate result in dependence upon a parameter gradient with respect to each of the parameters in the S 1 section of the neural network, and forward the S 1 section first intermediate result to the respective k'th reconfigurable master network interface controller RU k without passing through the host processor, the calculating of the S 1 section first intermediate result by the (k,i)'th RP at least sometimes occurring in parallel with the aggregation of the S 2 second intermediate results by the N master network interface controllers. 4. The system of claim 3 , wherein the neural network further includes a section S 0 having inputs and further having outputs providing values to inputs of the section S 1 , wherein the second dataflow segment configured into each of the master network interface controllers RU k includes pipeline stages that, in response to receipt by the master network interface controller RU k of the S 1 section first intermediate results from a (k,i)'th one of the RPs, sends a P2P message to the (k,i)'th RP indicating such receipt, wherein the second dataflow segment configured into each of the master network interface controllers RU k further includes pipeline stages that, in response to receipt by the master network interface controller RU k of the S 1 section first intermediate results from all of RP k.i , i=0, . . . , M k−1 , initiate the aggregation of second intermediate results to calculate the third intermediate result indicative of the update of the neural network parameters in the S 1 section, and wherein the first dataflow segment configured into a (k,i)'th one of the RPs further includes pipeline stages that, in response to the P2P message from the reconfigurable master network interface controller RU k indicating receipt from the (k,i)'th one of the RPs of the S 1 section first intermediate results: ca
Combinations of networks · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Activation functions · CPC title
Supervised learning · CPC title
Feedforward networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.