Sparse matrix vector multiplication
US-2018067899-A1 · Mar 8, 2018 · US
US10311127B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10311127-B2 |
| Application number | US-201715796358-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 27, 2017 |
| Priority date | Apr 16, 2014 |
| Publication date | Jun 4, 2019 |
| Grant date | Jun 4, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for multiplying a sparse matrix by a vector using a single instruction multiple data (SIMD) architecture are provided. An example method includes sorting rows of the sparse matrix by a number of non-zero elements in the rows to generate sorted rows. The sorted rows are split to generate groups of the sorted rows. The number of rows in each group of the sorted rows is equal to the number of rows updated in parallel. The method allows for packing the sorted rows in each of the groups to generate packed rows. Each of the packed rows within the same group has the same length. Per clock cycle, C elements of the packed rows and data for selecting elements of the vector are provided to computational units in the SIMD architecture, where C is the number of computational units.
Opening claim text (preview).
What is claimed is: 1. A system having a single instruction multiple data architecture, the system comprising: a plurality of computational units; a processor executing instructions for preparing a M×N sparse matrix; and memory storing groups of single dimensional sub-arrays representative of the M×N sparse matrix that has been prepared by the processor, the single dimensional sub-arrays including all non-zero elements in a first common dimension of the M×N sparse matrix, each group being sorted with respect to the other groups based on a number of non-zero value elements in the single dimensional sub-arrays in the each group, a number of single dimensional sub-arrays in each group not more than a number (R) of single dimensional sub-arrays processed in parallel by the plurality of computational units, the single dimensional sub-arrays of each group have a common number of matrix elements, at least one group having single dimensional sub-arrays that have been packed to remove zero-value elements so as to arrive at the common number, wherein a number (C) of the computational units are configured to multiply, in parallel, C elements of the single dimensional sub-arrays in each group with one or more elements of a vector. 2. The system of claim 1 , each computational unit includes a vector element selection unit, wherein the vector selection units are configured to select elements of the vector before multiplication. 3. The system of claim 1 , each computational unit includes a multiply accumulate unit, wherein the multiply accumulate units are configured to multiply, in parallel, the C elements of the single dimensional sub-arrays with elements of a vector. 4. The system of claim 1 , wherein the first common dimension is a horizontal dimension of the M×N sparse matrix. 5. The system of claim 1 , wherein R is not greater than C. 6. The system of claim, 1 wherein the C computational units of the processor are divided among the R single dimensional sub-arrays. 7. The system of claim 1 further comprising a plurality of registers, wherein the computational units are configured to select elements of the vector based on data in registers before multiplying elements of the vector and elements of the single dimensional sub-arrays. 8. The system of claim 1 , wherein, after multiplication, the processor is configured to access and use reordering indices of the single dimensional sub-arrays based on data stored in memory. 9. A system having a single instruction multiple data (SIMD) architecture, the system comprising: a plurality of computational units; a processor executing instructions for preparing a M×N sparse matrix; and memory storing processor-executable instructions and elements of the M×N sparse matrix represented by groups of single dimensional sub-arrays, the single dimensional sub-arrays including all non-zero elements in a first common dimension of the M×N sparse matrix, each group being sorted with respect to the other groups based on a number of non-zero value elements in the single dimensional sub-arrays in the each group, a number of single dimensional sub-arrays in each group is not more than a number (R) of single dimensional sub-arrays processed in parallel by the plurality of computational units, the single dimensional sub-arrays in each group having a common number of matrix elements, at least one group having single dimensional sub-arrays that have been packed to remove zero-value elements so as to arrive at the common number, wherein upon execution of the processor-executable instructions, a number (C) of the computational units are configured to multiply, in parallel, C elements of the single dimensional sub-arrays in each group with one or more elements of a vector. 10. The system of claim 9 , each computational unit includes a vector element selection unit, wherein the processor-executable instructions configure the vector selection units of the C computational units to select elements of the vector before multiplication. 11. The system of claim 9 , each computational unit includes a multiply accumulate unit, wherein the processor-executable instructions configure the multiply accumulate units of the C computational units to multiply the C elements of the single dimensional sub-arrays with elements of a vector. 12. The system of claim 9 further comprising a plurality of registers, wherein the processor-executable instructions configure the computational units to select elements of the vector based on data in registers before multiplying elements of the vector and elements of the single dimensional sub-arrays. 13. The system of claim 9 , wherein, after multiplication, the processor-executable instructions configure the processor to compensate for any reordering of the single dimensional sub-arrays based on data stored in memory. 14. The system of claim 9 , wherein the elements of each single dimensional sub-array are in a horizontal row of the M×N sparse matrix. 15. A method in a processor having a single instruction multiple data architecture including a plurality of computational units and memory, the method comprising: providing a number (C) of elements of single dimensional sub-arrays of a group to C computational units of the processor, the C elements of the single dimensional sub-arrays stored in the memory, the single dimensional sub-arrays including all non-zero value elements in a first common dimension of an M×N sparse matrix, each group being sorted with respect to the other groups based on a number of non-zero value elements in the single dimensional sub-arrays in the each group, a number of single dimensional sub-arrays in each group not more than a number (R) of single dimensional sub-arrays processed in parallel by the plurality of computational units, the single dimensional sub-arrays in each group having a common number of matrix elements, at least one group having single dimensional sub-arrays that have been packed to remove zero-value elements so as to arrive at the common number; and multiplying, in parallel, the C elements of the single dimensional sub-arrays and one or more elements of a vector using the C computational units. 16. The method of claim 15 further comprising selecting elements of the vector based on data accessible by the computational units before multiplying elements of the vector and elements of the single dimensional sub-arrays. 17. The method of claim 16 , after multiplying, compensating for any reordering of the single dimensional sub-arrays based on data stored in memory. 18. The method of claim 15 , wherein the first common dimension is a horizontal row of the M×N sparse matrix. 19. A method in a processor for compressing an M×N sparse matrix for processing by a single instruction multiple data (SIMD) architecture processor having a plurality of computational units, the method comprising: grouping single dimensional sub-arrays of an M×N sparse matrix based on a number of non-zero value elements in each single dimensional sub-array, each of the single dimensional sub-arrays arranged in a first common dimension, wherein a number of single dimensional sub-arrays in each group is not more than a number (R) of single dimensional sub-arrays to be processed in parallel by the plurality of computational units of the SIMD architecture processor; reducing a number of elements in the single dimensional sub-arrays of at least one group by dropping zero-value elements, without dropping non-zero value elements, wherein the single dimensional sub-arrays in each group have a common numb
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.