Processor for large graph algorithm computations and matrix operations
US-9529590-B2 · Dec 27, 2016 · US
US9830302B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9830302-B1 |
| Application number | US-201514685277-A |
| Country | US |
| Kind code | B1 |
| Filing date | Apr 13, 2015 |
| Priority date | Apr 16, 2014 |
| Publication date | Nov 28, 2017 |
| Grant date | Nov 28, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for multiplying a sparse matrix by a vector using a single instruction multiple data (SIMD) architecture are provided. An example method includes sorting rows of the sparse matrix by a number of non-zero elements in the rows to generate sorted rows. The sorted rows are split to generate groups of the sorted rows. The number of rows in each group of the sorted rows is equal to the number of rows updated in parallel. The method allows for packing the sorted rows in each of the groups to generate packed rows. Each of the packed rows within the same group has the same length. Per clock cycle, C elements of the packed rows and data for selecting elements of the vector are provided to computational units in the SIMD architecture, where C is the number of computational units.
Opening claim text (preview).
What is claimed is: 1. A system for multiplying a sparse matrix by a vector using a single instruction multiple data (SIMD) architecture, the system comprising: at least one processor; and a memory communicatively coupled with the at least one processor, the memory storing instructions which when executed by the at least one processor perform a method comprising: sorting rows of the sparse matrix by a number of non-zero elements in the rows to generate sorted rows; splitting the sorted rows to generate groups of the sorted rows, wherein a number of rows in each group of the sorted rows is equal to a number (R) of rows updated in parallel; packing the sorted rows in each of the groups to generate packed rows, wherein each of the packed rows within the same group has a length; and providing, per clock cycle, C elements of the packed rows to computational units in an SIMD architecture, wherein C is the number of computational units. 2. The system of claim 1 , wherein each of the computational units includes a multiply accumulate (MAC) unit. 3. The system of claim 1 , wherein the rows of the sparse matrix are sorted as one of the following: ascending order of the numbers of the non-zero elements; and descending order of the numbers of the non-zero elements. 4. The system of claim 1 , wherein R is equal to C. 5. The system of claim 1 , wherein R divides C. 6. The system of claim 1 , wherein: packing the rows includes removing zero elements in the rows. 7. The system of claim 6 , wherein packing the rows further includes causing a single row, within one of the groups, having less than the maximum number of non-zero elements for the one group, to have the same length as other of the packed rows in the one group having the maximum number of non-zero elements by zero padding. 8. The system of claim 1 , further comprising prior to sorting the rows: dividing the sparse matrix evenly into blocks of columns, wherein the sorting, splitting, packing, and providing are performed on each of the blocks of the columns separately. 9. The system of claim 1 , further comprising: providing data for selecting elements of the vector to the computational units in the SIMD architecture for allowing them to perform a multiplication of appropriate elements between the sparse matrix and the vector. 10. The system of claim 1 , further comprising, while providing elements of the packed rows: providing row identification information of the packed rows in the sparse matrix to the computational units in the SIMD architecture. 11. A method comprising: sorting rows of the sparse matrix by a number of non-zero elements in the rows to generate sorted rows; splitting the sorted rows to generate groups of the sorted rows, wherein a number of rows in each group of the sorted rows is equal to a number (R) of rows updated in parallel; packing the sorted rows in each of the groups to generate packed rows, wherein each of the packed rows within the same group has a length; and providing, per clock cycle, C elements of the packed rows to computational units in an SIMD architecture, wherein C is the number of computational units. 12. The method of claim 11 , wherein each of the computational units includes a multiply accumulate (MAC) unit. 13. The method of claim 11 , wherein the rows of the sparse matrix are sorted as one of the following: ascending order of the numbers of the non-zero elements; and descending order of the numbers of the non-zero elements. 14. The method of claim 11 , wherein one of the following conditions is satisfied: R is equal to C; and R divides C. 15. The method of claim 11 , wherein: packing the rows includes removing zero elements in the rows. 16. The method of claim 15 , wherein packing the rows further includes causing a single row, within one of the groups, having less than the maximum number of non-zero elements for the one group, to have the same length as other of the packed rows in the one group having the maximum number of non-zero elements by zero padding. 17. The method of claim 11 , further comprising prior to sorting the rows: dividing the sparse matrix evenly into blocks of columns; and wherein the sorting, splitting, packing, and providing are performed on each of the blocks of the columns separately. 18. The method of claim 11 , further comprising: providing data for selecting elements of the vector to the computational units in the SIMD architecture for allowing them to perform a multiplication of appropriate elements between the sparse matrix and the vector. 19. A non-transitory computer-readable storage medium having embodied thereon instructions, which when executed by at least one processor, perform steps of a method, the method comprising: sorting rows of the sparse matrix by a number of non-zero elements in the rows to generate sorted rows; splitting the sorted rows to generate groups of the sorted rows, wherein a number of rows in each group of the sorted rows is equal to a number (R) of rows updated in parallel; packing the sorted rows in each of the groups to generate packed rows, wherein each of the packed rows within the same group has a length; and providing, per clock cycle, C elements of the packed rows to computational units in an SIMD architecture, wherein C is the number of computational units.
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.