Sparse matrix vector multiplication

US9830302B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9830302-B1
Application numberUS-201514685277-A
CountryUS
Kind codeB1
Filing dateApr 13, 2015
Priority dateApr 16, 2014
Publication dateNov 28, 2017
Grant dateNov 28, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for multiplying a sparse matrix by a vector using a single instruction multiple data (SIMD) architecture are provided. An example method includes sorting rows of the sparse matrix by a number of non-zero elements in the rows to generate sorted rows. The sorted rows are split to generate groups of the sorted rows. The number of rows in each group of the sorted rows is equal to the number of rows updated in parallel. The method allows for packing the sorted rows in each of the groups to generate packed rows. Each of the packed rows within the same group has the same length. Per clock cycle, C elements of the packed rows and data for selecting elements of the vector are provided to computational units in the SIMD architecture, where C is the number of computational units.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for multiplying a sparse matrix by a vector using a single instruction multiple data (SIMD) architecture, the system comprising: at least one processor; and a memory communicatively coupled with the at least one processor, the memory storing instructions which when executed by the at least one processor perform a method comprising: sorting rows of the sparse matrix by a number of non-zero elements in the rows to generate sorted rows; splitting the sorted rows to generate groups of the sorted rows, wherein a number of rows in each group of the sorted rows is equal to a number (R) of rows updated in parallel; packing the sorted rows in each of the groups to generate packed rows, wherein each of the packed rows within the same group has a length; and providing, per clock cycle, C elements of the packed rows to computational units in an SIMD architecture, wherein C is the number of computational units. 2. The system of claim 1 , wherein each of the computational units includes a multiply accumulate (MAC) unit. 3. The system of claim 1 , wherein the rows of the sparse matrix are sorted as one of the following: ascending order of the numbers of the non-zero elements; and descending order of the numbers of the non-zero elements. 4. The system of claim 1 , wherein R is equal to C. 5. The system of claim 1 , wherein R divides C. 6. The system of claim 1 , wherein: packing the rows includes removing zero elements in the rows. 7. The system of claim 6 , wherein packing the rows further includes causing a single row, within one of the groups, having less than the maximum number of non-zero elements for the one group, to have the same length as other of the packed rows in the one group having the maximum number of non-zero elements by zero padding. 8. The system of claim 1 , further comprising prior to sorting the rows: dividing the sparse matrix evenly into blocks of columns, wherein the sorting, splitting, packing, and providing are performed on each of the blocks of the columns separately. 9. The system of claim 1 , further comprising: providing data for selecting elements of the vector to the computational units in the SIMD architecture for allowing them to perform a multiplication of appropriate elements between the sparse matrix and the vector. 10. The system of claim 1 , further comprising, while providing elements of the packed rows: providing row identification information of the packed rows in the sparse matrix to the computational units in the SIMD architecture. 11. A method comprising: sorting rows of the sparse matrix by a number of non-zero elements in the rows to generate sorted rows; splitting the sorted rows to generate groups of the sorted rows, wherein a number of rows in each group of the sorted rows is equal to a number (R) of rows updated in parallel; packing the sorted rows in each of the groups to generate packed rows, wherein each of the packed rows within the same group has a length; and providing, per clock cycle, C elements of the packed rows to computational units in an SIMD architecture, wherein C is the number of computational units. 12. The method of claim 11 , wherein each of the computational units includes a multiply accumulate (MAC) unit. 13. The method of claim 11 , wherein the rows of the sparse matrix are sorted as one of the following: ascending order of the numbers of the non-zero elements; and descending order of the numbers of the non-zero elements. 14. The method of claim 11 , wherein one of the following conditions is satisfied: R is equal to C; and R divides C. 15. The method of claim 11 , wherein: packing the rows includes removing zero elements in the rows. 16. The method of claim 15 , wherein packing the rows further includes causing a single row, within one of the groups, having less than the maximum number of non-zero elements for the one group, to have the same length as other of the packed rows in the one group having the maximum number of non-zero elements by zero padding. 17. The method of claim 11 , further comprising prior to sorting the rows: dividing the sparse matrix evenly into blocks of columns; and wherein the sorting, splitting, packing, and providing are performed on each of the blocks of the columns separately. 18. The method of claim 11 , further comprising: providing data for selecting elements of the vector to the computational units in the SIMD architecture for allowing them to perform a multiplication of appropriate elements between the sparse matrix and the vector. 19. A non-transitory computer-readable storage medium having embodied thereon instructions, which when executed by at least one processor, perform steps of a method, the method comprising: sorting rows of the sparse matrix by a number of non-zero elements in the rows to generate sorted rows; splitting the sorted rows to generate groups of the sorted rows, wherein a number of rows in each group of the sorted rows is equal to a number (R) of rows updated in parallel; packing the sorted rows in each of the groups to generate packed rows, wherein each of the packed rows within the same group has a length; and providing, per clock cycle, C elements of the packed rows to computational units in an SIMD architecture, wherein C is the number of computational units.

Assignees

Inventors

Classifications

  • G06F17/16Primary

    Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9830302B1 cover?
Systems and methods for multiplying a sparse matrix by a vector using a single instruction multiple data (SIMD) architecture are provided. An example method includes sorting rows of the sparse matrix by a number of non-zero elements in the rows to generate sorted rows. The sorted rows are split to generate groups of the sorted rows. The number of rows in each group of the sorted rows is equal t…
Who is the assignee on this patent?
Knowles Electronics Llc
What technology area does this patent fall under?
Primary CPC classification G06F17/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 28 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).