Neural network unit that performs stochastic rounding
US-2017102920-A1 · Apr 13, 2017 · US
US10346351B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10346351-B2 |
| Application number | US-201615090829-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 5, 2016 |
| Priority date | Oct 8, 2015 |
| Publication date | Jul 9, 2019 |
| Grant date | Jul 9, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An output buffer holds N words arranged as N/J mutually exclusive output buffer word groups (OBWG) of J words each. N processing units (PU) are arranged as N/J mutually exclusive PU groups each having an associated OBWG. Each PU has an accumulator, arithmetic unit, and first and second multiplexed registers each having at least J+1 inputs. A first input receives a memory operand and the other J inputs receive the J words of the associated OBWG. Each accumulator provides its output to a respective OBWG. Each arithmetic unit performs an operation on the first and second multiplexed register outputs and accumulator output to generate a result for accumulation into the accumulator. A mask input to the output buffer controls which words, if any, of the N words retain their current value or are updated with their respective accumulator output. Each PU group operates as a recurrent neural network LSTM cell.
Opening claim text (preview).
The invention claimed is: 1. An apparatus, comprising: an output buffer that holds N words arranged as N/J mutually exclusive output buffer word groups of J words each of the N words, J is greater than 2 and N is at least twice J; an array of N processing units (PU) arranged as N/J mutually exclusive PU groups of J PUs each of the N PUs, each PU group of the N/J PU groups has an associated output buffer word group of the N/J output buffer word groups, each PU having: first and second multiplexed registers each having: at least J+1 inputs, a first input of the J+1 inputs receives an operand from a memory and the other J inputs receive the J words of the associated output buffer word group; an output; and a control input that controls selection of the J+1 inputs for provision on the output; an accumulator having an output for provision to a respective one of the N output buffer words; and an arithmetic unit having first and second inputs to receive the output of the first and second multiplexed registers, respectively, and a third input that receives the accumulator output, the arithmetic unit performs an operation on the first, second and third inputs to generate a result for accumulation into the accumulator; the output buffer includes a mask input that controls which words, if any, of the N words retain their current value or are updated with their respective accumulator output; and each PU group of the N/J PU groups of J PUs operates as a Long Short Term Memory (LSTM) cell of a recurrent neural network, a first of the J PUs computes an input gate, a second of the J PUs computes a forget gate, and a third of the J PUs computes an output gate of the LSTM cell. 2. The apparatus of claim 1 , further comprising: the mask input specifies to update first, second and third of the J words of the associated output buffer word group with the input gate, forget gate and output gates computed by the respective first, second and third of the J PUs. 3. The apparatus of claim 2 , further comprising: the first, second and third of the J PUs compute the input gate, forget gate and output gates concurrently. 4. The apparatus of claim 2 , further comprising: a fourth of the J PUs computes a candidate state of the LSTM cell. 5. The apparatus of claim 4 , further comprising: the mask input specifies to update a fourth of the J words of the associated output buffer word group with the candidate state of the LSTM cell but to retain the current value of the first, second and third of the J words of the associated output buffer word group. 6. The apparatus of claim 4 , further comprising: one of the J PUs computes the new state of the LSTM cell and an activation function thereof using the input gate, the forget gate, the candidate state of the LSTM cell, and a current state of the LSTM cell. 7. The apparatus of claim 6 , further comprising: a memory from which the one of the J PUs reads the current state of the LSTM cell and to which the output buffer writes the new state of the LSTM cell. 8. The apparatus of claim 6 , further comprising: one of the J PUs computes a new output of the LSTM cell using the output gate and the activation function of the new state of the LSTM cell. 9. The apparatus of claim 8 , further comprising: a memory from which the J PUs read a current output of the LSTM cell and to which the output buffer writes the new output of the LSTM cell. 10. The apparatus of claim 1 , further comprising: the first, second and third of the J PUs compute the input gate, forget gate and output gate, respectively, using a current output of the LSTM cell and respective weights and using a new input to the LSTM cell and respective weights. 11. The apparatus of claim 10 , further comprising: the first, second and third of the J PUs read the current output from the output buffer. 12. The apparatus of claim 10 , further comprising: a memory from which the first, second and third of the J PUs read the new input. 13. The apparatus of claim 10 , further comprising: a memory from which the first, second and third of the J PUs read the weights. 14. A method for operating an apparatus having an output buffer that holds N words arranged as N/J mutually exclusive output buffer word groups of J words each of the N words, J is greater than 2 and N is at least twice J, an array of N processing units (PU) arranged as N/J mutually exclusive PU groups of J PUs each of the N PUs, each PU group of the N/J PU groups has an associated output buffer word group of the N/J output buffer word groups, the output buffer includes a mask input that controls which words, if any, of the N words retain their current value or are updated with their respective accumulator output, each PU has first and second multiplexed registers each having an output, an accumulator having an output for provision to a respective one of the N output buffer words, and an arithmetic unit having first and second inputs to receive the output of the first and second multiplexed registers, respectively, and a third input that receives the accumulator output, the arithmetic unit performs an operation on the first, second and third inputs to generate a result for accumulation into the accumulator, each of the first and second multiplexed registers has at least J+1 inputs, a first input of the J+1 inputs receives an operand from a memory and the other J inputs receive the J words of the associated output buffer word group, an output, and a control input that controls selection of the J+1 inputs for provision on the output, the method comprising: by each PU group of the N/J PU groups of J PUs, operating as a Long Short Term Memory (LSTM) cell of a recurrent neural network by: computing, by a first of the J PUs, an input gate of the LSTM cell; computing, by a second of the J PUs, a forget gate of the LSTM cell; and computing, by a third of the J PUs, an output gate of the LSTM cell. 15. The method of claim 14 , further comprising: specifying, by the mask input, to update first, second and third of the J words of the associated output buffer word group with the input gate, forget gate and output gates computed by the respective first, second and third of the J PUs. 16. The method of claim 15 , further comprising: computing, by the first, second and third of the J PUs, the input gate, forget gate and output gates concurrently. 17. The method of claim 15 , further comprising: computing, by a fourth of the J PUs, a candidate state of the LSTM cell. 18. The method of claim 17 , further comprising: specifying, by the mask input, to update a fourth of the J words of the associated output buffer word group with the candidate state of the LSTM cell but to retain the current value of the first, second and third of the J words of the associated output buffer word group. 19. The method of claim 17 , further comprising: computing, by one of the J PUs, the new state of the LSTM cell and an activation function thereof using the input gate, the forget gate, the candidate state of the LSTM cell, and a current state of the LSTM cell. 20. The method of claim 19 , further comprising: reading, by the one of the J PUs, from a memory the current state of the LSTM cell; and writing, by the output buffer, to the memory the new state of the LSTM cell. 21. The method of claim 19 , further comprising: computing, by one of the J PUs, a new output of the LSTM cell using the output gate and the activation function of the new state of the LSTM cell. 22. The method of claim 21 , further comprising: reading, by the J
Related publications grouped by family.
Answers are generated from the same data shown on this page.