Accelerator for deep neural networks
US-2019205740-A1 · Jul 4, 2019 · US
US10891538B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10891538-B2 |
| Application number | US-201715659371-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 25, 2017 |
| Priority date | Aug 11, 2016 |
| Publication date | Jan 12, 2021 |
| Grant date | Jan 12, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: receiving, by a parallel processing unit, a first instruction including a first index vector operand and a second index vector operand; decoding, by the parallel processing unit, the first index vector operand to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array; decoding, by the parallel processing unit, the second index vector operand to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array; summing, by the parallel processing unit, the first coordinate sets with the second coordinate sets to produce output coordinate sets; and converting, by the parallel processing unit, the output coordinate sets into a set of linear indices. 2. The method of claim 1 , wherein the summing comprises summing each first coordinate in the first coordinate sets with a third coordinate in the second coordinate sets to produce the coordinates of the output coordinate sets. 3. The method of claim 1 , wherein the first array is a three-dimensional array and the second array is a two-dimensional array. 4. The method of claim 1 , wherein the first array stores weight values and the second array stores activation values. 5. The method of claim 1 , wherein the first index vector encodes values that are accumulated in sequence to compute the first coordinates for each non-zero element in the first array. 6. The method of claim 1 , further comprising: receiving, by the parallel processing unit, a second instruction including a first set of linear addresses operand and a second scalar values operand; and summing, by the parallel processing unit, scalar values in the scalar values operand to values in a third array, each value in the third array corresponding to a linear address in the first set of linear addresses operand. 7. The method of claim 6 , wherein the first indices operand is the set of linear indices. 8. The method of claim 1 , further comprising: receiving a second instruction including a first non-zero elements vector operand and a second non-zero elements vector operand; and multiplying each one of the non-zero elements in the first non-zero values vector operand by every one of the non-zero elements in the second non-zero elements vector operand to produce a vector of products. 9. The method of claim 8 , wherein each of the non-zero elements in the first non-zero elements vector operand corresponds to an index in the first index vector operand and each of the non-zero elements in the second non-zero values vector operand corresponds to an index in the second index vector operand. 10. The method of claim 8 , further comprising: receiving, by the parallel processing unit, a third instruction including a first set of linear addresses operand and a second scalar values operand, wherein the first set of linear addresses operand is the set of linear addresses and the second scalar values operand is the vector of products; and summing, by the parallel processing unit, scalar values in the scalar values operand with partial sums in a third array, each partial sum in the third array corresponding to a linear address in the first set of linear addresses operand. 11. The method of claim 1 , further comprising, before receiving the first instruction: receiving, by the parallel processing unit, a second instruction including a first scalar values operand; generating a vector of non-zero elements including only values in the first scalar values operand that are not equal to zero; and generating a vector of indices comprising positions within the second array, wherein each index in the vector of indices is associated with a non-zero element in the vector of non-zero elements. 12. The method of claim 11 , wherein the second indices operand is the vector of indices. 13. A processor, comprising: parallel processing units configured to: receive a first instruction including a first index vector operand and a second index vector operand; decode the first index vector operand to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array; decode the second index vector operand to produce second coordinate sets for a second array, each second coordinate set including at least the first coordinate and the second coordinate of a position of a non-zero element in the second array; sum the first coordinate sets with the second coordinate sets to produce output coordinate sets; and convert the output coordinate sets into a set of linear indices. 14. The processor of claim 13 , wherein the parallel processing units are further configured to sum each first coordinate in the first coordinate sets with a third coordinate in the second coordinate sets to produce the coordinates of the output coordinate sets. 15. The processor of claim 13 , wherein the first index vector encodes values that are accumulated in sequence to compute the first coordinates for each non-zero element in the first array. 16. The processor of claim 13 , the parallel processing units are further configured to: receive a second instruction including a first set of linear addresses operand and a second scalar values operand; and sum scalar values in the scalar values operand to values in a third array, each value in the third array corresponding to a linear address in the first set of linear addresses operand. 17. The processor of claim 16 , wherein the first indices operand is the set of linear indices. 18. The processor of claim 13 , the parallel processing units are further configured to: receive a second instruction including a first non-zero elements vector operand and a second non-zero elements vector operand; and multiply each one of the non-zero elements in the first non-zero values vector operand by every one of the non-zero elements in the second non-zero elements vector operand to produce a vector of products. 19. The processor of claim 13 , the parallel processing units are further configured to, before receiving the first instruction: receive a second instruction including a first scalar values operand; generate a vector of non-zero elements including only values in the first scalar values operand that are not equal to zero; and generate a vector of indices comprising positions within the second array, wherein each index in the vector of indices is associated with a non-zero element in the vector of non-zero elements. 20. A non-transitory, computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform steps comprising: receiving a first instruction including a first index vector operand and a second index vector operand; decoding the first index vector operand to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array; decoding the second index vector operand to produce second coordinate sets for a second array, each second coordinate set including at least the first coordinate and the second coordinate of a position of a non-zero element in the second array; summing the first coordinate sets wit
Knowledge-based neural networks; Logical representations of neural networks · CPC title
Combinations of networks · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.