Dataflow all-reduce for reconfigurable processor systems

US11237880B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11237880-B1
Application numberUS-202117379924-A
CountryUS
Kind codeB1
Filing dateJul 19, 2021
Priority dateDec 18, 2020
Publication dateFeb 1, 2022
Grant dateFeb 1, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Roughly described, a system for data parallel training of a neural network on multiple reconfigurable units configured by a host with dataflow pipelines to perform different steps in the training CGRA units are configured to evaluate first and second sequential sections of neural network layers based on a respective subset of training data, and to back-propagate the error through the sections to calculate parameter gradients for the respective subset. Gradient synchronization and reduction are performed by one or more units having finer grain reconfigurability, such as an FPGA. The FPGA performs synchronization and reduction of the gradients for the second section while the CGRA units perform back-propagation through the first sequential section. Intermediate results are transmitted using a P2P message passing protocol layer. Execution of dataflow segments in the different units is triggered by receipt of data, rather than by a command from any host system.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for training parameters of a neural network using training data samples partitioned across a plurality of participating reconfigurable processors, each of the data samples including a plurality of input values and a set of at least one target output value, comprising: a plurality of N processing nodes, N>1, each k'th one of processing nodes in the plurality of N processing nodes including: a respective plurality of Mk first reconfigurable processors RP k.0 . . . RP k.Mk−1 , each reconfigurable processor in the respective plurality of Mk first reconfigurable processors RP k.0 . . . RP k.Mk−1 being reconfigurable at a first level of configuration granularity, a respective k'th reconfigurable master network interface controller RU k , k=0, . . . , N−1, each master network interface controller in the respective k'th reconfigurable master network interface controller RU k , k=0, . . . , N−1 being reconfigurable at a second level of configuration granularity finer than the first level of configuration granularity, and a network interconnecting master network interface controllers in the respective k'th reconfigurable master network interface controller RU k , k=0, . . . , N−1; wherein each reconfigurable processor is configured by a host processor with one or more dataflow pipelines implementing a respective instance of a first dataflow segment, the first dataflow segment including, for a respective current subset of training data samples and a current set of neural network parameters, pipeline stages that: evaluate a neural network using input values of the respective current subset of training data samples and parameter values of the current set of neural network parameters, to calculate a set of at least one predicted output value, calculate a loss in dependence upon the predicted output value and a target output value in a set of at least one target output value, calculate a first intermediate result in dependence upon a parameter gradient with respect to each of at least a subset of neural network parameters in the current set of neural network parameters, and forward the first intermediate result to the respective k'th reconfigurable master network interface controller RU k without passing through the host processor; wherein each of the master network interface controllers is configured by the host processor with one or more dataflow pipelines implementing a respective instance of a second dataflow segment, the second dataflow segment including pipeline stages that: participate in an aggregation of first intermediate results from the first reconfigurable processors RP k.0 . . . RP k.Mk−1 , to calculate a second intermediate result in dependence upon the first intermediate results, communicate via a peer-to-peer (P2P) protocol layer over the network and without passing through the host processor, to participate in an aggregation of second intermediate results from a second reconfigurable units RU k , k=0, . . . , N−1, to calculate a third intermediate result in dependence upon the second intermediate results, the third intermediate result indicative of an update of the current set of neural network parameters, and forward the third intermediate result to the first reconfigurable processors RP k.i , i=0, . . . , M k−1 , without passing through the host processor, for use in a subsequent evaluation of the neural network using a subsequent subset of training data samples; and wherein each of the reconfigurable master network interface controllers RU k is configured by the host processor with a respective local gradient buffer having N gradient buffer segments, wherein the pipeline stages in the second dataflow segment configured into each of the reconfigurable master network interface controllers RU k , for communicating over the network to participate in the aggregation of the second intermediate results from the second reconfigurable units RU k , k=0, . . . , N−1, includes pipeline stages implementing an accumulation phase of the aggregation followed by a distribution phase of the aggregation. 2. The system of claim 1 , wherein the neural network includes a plurality of neural network sections including a section S 2 being an output neural network section having outputs for carrying the at least one predicted output value, and further including a section S 1 having inputs and further having outputs providing values to inputs of the section S 2 , and wherein the first intermediate result includes values dependent upon gradients of only the neural network parameters in the section S 2 , the third intermediate result being indicative of an update of only the neural network parameters in the section S 2 . 3. The system of claim 2 , wherein the second dataflow segment configured into each of the reconfigurable master network interface controllers RU k includes pipeline stages that, in response to receipt by a reconfigurable master network interface controller RU k of the first intermediate results from a (k,i)'th one of the RPs, sends a P2P message to the (k,i)'th RP indicating such receipt, wherein the second dataflow segment configured into each of the reconfigurable master network interface controllers RU k further includes pipeline stages that, in response to receipt by the reconfigurable master network interface controller RU k of the first intermediate results from all of RP k.i , i=0, . . . , M k−1 , initiate the aggregation of the second intermediate results to calculate the third intermediate result indicative of the update of the neural network parameters in the section S 2 , and wherein the first dataflow segment configured into a (k,i)'th one of the RPs further includes pipeline stages that, in response to the P2P message from the reconfigurable master network interface controller RU k indicating receipt from the (k,i)'th one of the RPs of the first intermediate result for the section S 2 : calculate an S 1 section first intermediate result in dependence upon a parameter gradient with respect to each of the parameters in the S 1 section of the neural network, and forward the S 1 section first intermediate result to the respective k'th reconfigurable master network interface controller RU k without passing through the host processor, the calculating of the S 1 section first intermediate result by the (k,i)'th RP at least sometimes occurring in parallel with the aggregation of the S 2 second intermediate results by the N master network interface controllers. 4. The system of claim 3 , wherein the neural network further includes a section S 0 having inputs and further having outputs providing values to inputs of the section S 1 , wherein the second dataflow segment configured into each of the master network interface controllers RU k includes pipeline stages that, in response to receipt by the master network interface controller RU k of the S 1 section first intermediate results from a (k,i)'th one of the RPs, sends a P2P message to the (k,i)'th RP indicating such receipt, wherein the second dataflow segment configured into each of the master network interface controllers RU k further includes pipeline stages that, in response to receipt by the master network interface controller RU k of the S 1 section first intermediate results from all of RP k.i , i=0, . . . , M k−1 , initiate the aggregation of second intermediate results to calculate the third intermediate result indicative of the update of the neural network parameters in the S 1 section, and wherein the first dataflow segment configured into a (k,i)'th one of the RPs further includes pipeline stages that, in response to the P2P message from the reconfigurable master network interface controller RU k indicating receipt from the (k,i)'th one of the RPs of the S 1 section first intermediate results: ca

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Activation functions · CPC title

  • Supervised learning · CPC title

  • Feedforward networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11237880B1 cover?
Roughly described, a system for data parallel training of a neural network on multiple reconfigurable units configured by a host with dataflow pipelines to perform different steps in the training CGRA units are configured to evaluate first and second sequential sections of neural network layers based on a respective subset of training data, and to back-propagate the error through the sections t…
Who is the assignee on this patent?
Sambanova Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06F9/544. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 01 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).