Hardware Implementation of Convolutional Layer of Deep Neural Network
US-2019138567-A1 · May 9, 2019 · US
US11010662B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11010662-B2 |
| Application number | US-202016808900-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 4, 2020 |
| Priority date | Mar 30, 2018 |
| Publication date | May 18, 2021 |
| Grant date | May 18, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Massively parallel neural inference computing elements are provided. A plurality of multipliers is arranged in a plurality of equal-sized groups. Each of the plurality of multipliers is adapted to, in parallel, apply a weight to an input activation to generate an output. A plurality of adders is operatively coupled to one of the groups of multipliers. Each of the plurality of adders is adapted to, in parallel, add the outputs of the multipliers within its associated group to generate a partial sum. A plurality of function blocks is operatively coupled to one of the plurality of adders. Each of the plurality of function blocks is adapted to, in parallel, apply a function to the partial sum of its associated adder to generate an output value.
Opening claim text (preview).
What is claimed is: 1. A system comprising: a plurality of multipliers, the plurality of multipliers arranged in a plurality of equal-sized groups, each of the plurality of multipliers being adapted to, in parallel, apply a weight to an input activation to generate an output; a plurality of adders, each of the plurality of adders being operatively coupled to one of the groups of multipliers, each of the plurality of adders being adapted to, in parallel, add the outputs of the multipliers within its associated group to generate a partial sum; and a first plurality of function blocks, each of the first plurality of function blocks being operatively coupled to one of the plurality of adders, each of the first plurality of function blocks being adapted to, in parallel, apply a function to the partial sum of its associated adder to generate an output value, wherein the first plurality of function blocks is adapted to combine the output values with subsequently computed output values of the first plurality of function blocks. 2. The system of claim 1 , adapted to receive a matrix of weights and a vector of activations. 3. The system of claim 1 , wherein each of the plurality of adders comprises a tree of adders. 4. The system of claim 3 , wherein the tree of adders is a binary tree. 5. The system of claim 3 , wherein the tree of adders comprises a plurality of carry-save adders. 6. The system of claim 2 , wherein each activation of the vector of activations is broadcast to all of the groups of multipliers. 7. The system of claim 2 , further comprising a systolic pipeline operatively coupled to each of the groups of multipliers. 8. The system of claim 1 , wherein the groups of multipliers are pipelined. 9. The system of claim 1 , wherein the weights are balanced ternary values. 10. The system of claim 1 , wherein each of the plurality of multipliers comprises a multiplexor. 11. The system of claim 2 , wherein the matrix of weights is compressed, and wherein the system is adapted to decompress the compressed matrix of weights. 12. The system of claim 1 , further comprising: a plurality of shifters, each shifter operatively connected to one of the first plurality of function blocks, each shifter adapted to, in parallel, shift the output value of its corresponding function block, and wherein combining the output values with subsequently computed output values comprises combining the shifted values with the subsequently computed output values. 13. The system of claim 1 , wherein the function of each of the first plurality of function blocks is an activation function. 14. The system of claim 1 , wherein the function of each of the first plurality of function blocks is programmable. 15. The system of claim 1 , wherein the function of each of the first plurality of function blocks is addition. 16. The system of claim 1 , wherein the function of each of the first plurality of function blocks is multiplication. 17. The system of claim 1 , wherein the function of each of the first plurality of function blocks is an identity function. 18. The system of claim 1 , further comprising a lookup table, the function of each of the first plurality of function blocks comprising a lookup from the lookup table. 19. The system of claim 18 , wherein the lookup table is programmable. 20. The system of claim 1 , wherein the function of each of the first plurality of function blocks is a max function. 21. The system of claim 1 , wherein the function of each of the first plurality of function blocks is a min function. 22. A method comprising: applying by a plurality of equal-sized groups of multipliers, in parallel, a plurality of weights to a plurality of input activations to generate a plurality of outputs for each group of multipliers; adding by a plurality of adders, in parallel, the plurality of outputs from each group of multipliers to generate a partial sum from each group of multipliers; and applying by a first plurality of function blocks, each of the first plurality of function blocks being operatively coupled to one of the plurality of adders, in parallel, a function to the partial sum of its associated adder to generate an output value, wherein the first plurality of function blocks is adapted to combine the output values with subsequently computed output values of the first plurality of function blocks. 23. A system comprising: a plurality of multipliers, the plurality of multipliers arranged in a plurality of equal-sized groups; a plurality of adders, each of the plurality of adders being operatively coupled to one of the groups of multipliers; a first plurality of function blocks, each of the first plurality of function blocks being operatively coupled to one of the plurality of adders; a computer readable storage medium having program instructions embodied therewith, the program instructions executable to perform a method comprising: by each of the plurality of multipliers, in parallel, applying a weight to an input activation to generate an output; by each of the plurality of adders, in parallel, adding the outputs of the multipliers within its associated group to generate a partial sum; and by each of the first plurality of function blocks, in parallel, applying a function to the partial sum of its associated adder to generate an output value, wherein the first plurality of function blocks is adapted to combine the output values with subsequently computed output values of the first plurality of function blocks.
Activation functions · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Combinations of networks · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.