Sparse matrix vector multiplication

US10311127B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10311127-B2
Application numberUS-201715796358-A
CountryUS
Kind codeB2
Filing dateOct 27, 2017
Priority dateApr 16, 2014
Publication dateJun 4, 2019
Grant dateJun 4, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for multiplying a sparse matrix by a vector using a single instruction multiple data (SIMD) architecture are provided. An example method includes sorting rows of the sparse matrix by a number of non-zero elements in the rows to generate sorted rows. The sorted rows are split to generate groups of the sorted rows. The number of rows in each group of the sorted rows is equal to the number of rows updated in parallel. The method allows for packing the sorted rows in each of the groups to generate packed rows. Each of the packed rows within the same group has the same length. Per clock cycle, C elements of the packed rows and data for selecting elements of the vector are provided to computational units in the SIMD architecture, where C is the number of computational units.

First claim

Opening claim text (preview).

What is claimed is: 1. A system having a single instruction multiple data architecture, the system comprising: a plurality of computational units; a processor executing instructions for preparing a M×N sparse matrix; and memory storing groups of single dimensional sub-arrays representative of the M×N sparse matrix that has been prepared by the processor, the single dimensional sub-arrays including all non-zero elements in a first common dimension of the M×N sparse matrix, each group being sorted with respect to the other groups based on a number of non-zero value elements in the single dimensional sub-arrays in the each group, a number of single dimensional sub-arrays in each group not more than a number (R) of single dimensional sub-arrays processed in parallel by the plurality of computational units, the single dimensional sub-arrays of each group have a common number of matrix elements, at least one group having single dimensional sub-arrays that have been packed to remove zero-value elements so as to arrive at the common number, wherein a number (C) of the computational units are configured to multiply, in parallel, C elements of the single dimensional sub-arrays in each group with one or more elements of a vector. 2. The system of claim 1 , each computational unit includes a vector element selection unit, wherein the vector selection units are configured to select elements of the vector before multiplication. 3. The system of claim 1 , each computational unit includes a multiply accumulate unit, wherein the multiply accumulate units are configured to multiply, in parallel, the C elements of the single dimensional sub-arrays with elements of a vector. 4. The system of claim 1 , wherein the first common dimension is a horizontal dimension of the M×N sparse matrix. 5. The system of claim 1 , wherein R is not greater than C. 6. The system of claim, 1 wherein the C computational units of the processor are divided among the R single dimensional sub-arrays. 7. The system of claim 1 further comprising a plurality of registers, wherein the computational units are configured to select elements of the vector based on data in registers before multiplying elements of the vector and elements of the single dimensional sub-arrays. 8. The system of claim 1 , wherein, after multiplication, the processor is configured to access and use reordering indices of the single dimensional sub-arrays based on data stored in memory. 9. A system having a single instruction multiple data (SIMD) architecture, the system comprising: a plurality of computational units; a processor executing instructions for preparing a M×N sparse matrix; and memory storing processor-executable instructions and elements of the M×N sparse matrix represented by groups of single dimensional sub-arrays, the single dimensional sub-arrays including all non-zero elements in a first common dimension of the M×N sparse matrix, each group being sorted with respect to the other groups based on a number of non-zero value elements in the single dimensional sub-arrays in the each group, a number of single dimensional sub-arrays in each group is not more than a number (R) of single dimensional sub-arrays processed in parallel by the plurality of computational units, the single dimensional sub-arrays in each group having a common number of matrix elements, at least one group having single dimensional sub-arrays that have been packed to remove zero-value elements so as to arrive at the common number, wherein upon execution of the processor-executable instructions, a number (C) of the computational units are configured to multiply, in parallel, C elements of the single dimensional sub-arrays in each group with one or more elements of a vector. 10. The system of claim 9 , each computational unit includes a vector element selection unit, wherein the processor-executable instructions configure the vector selection units of the C computational units to select elements of the vector before multiplication. 11. The system of claim 9 , each computational unit includes a multiply accumulate unit, wherein the processor-executable instructions configure the multiply accumulate units of the C computational units to multiply the C elements of the single dimensional sub-arrays with elements of a vector. 12. The system of claim 9 further comprising a plurality of registers, wherein the processor-executable instructions configure the computational units to select elements of the vector based on data in registers before multiplying elements of the vector and elements of the single dimensional sub-arrays. 13. The system of claim 9 , wherein, after multiplication, the processor-executable instructions configure the processor to compensate for any reordering of the single dimensional sub-arrays based on data stored in memory. 14. The system of claim 9 , wherein the elements of each single dimensional sub-array are in a horizontal row of the M×N sparse matrix. 15. A method in a processor having a single instruction multiple data architecture including a plurality of computational units and memory, the method comprising: providing a number (C) of elements of single dimensional sub-arrays of a group to C computational units of the processor, the C elements of the single dimensional sub-arrays stored in the memory, the single dimensional sub-arrays including all non-zero value elements in a first common dimension of an M×N sparse matrix, each group being sorted with respect to the other groups based on a number of non-zero value elements in the single dimensional sub-arrays in the each group, a number of single dimensional sub-arrays in each group not more than a number (R) of single dimensional sub-arrays processed in parallel by the plurality of computational units, the single dimensional sub-arrays in each group having a common number of matrix elements, at least one group having single dimensional sub-arrays that have been packed to remove zero-value elements so as to arrive at the common number; and multiplying, in parallel, the C elements of the single dimensional sub-arrays and one or more elements of a vector using the C computational units. 16. The method of claim 15 further comprising selecting elements of the vector based on data accessible by the computational units before multiplying elements of the vector and elements of the single dimensional sub-arrays. 17. The method of claim 16 , after multiplying, compensating for any reordering of the single dimensional sub-arrays based on data stored in memory. 18. The method of claim 15 , wherein the first common dimension is a horizontal row of the M×N sparse matrix. 19. A method in a processor for compressing an M×N sparse matrix for processing by a single instruction multiple data (SIMD) architecture processor having a plurality of computational units, the method comprising: grouping single dimensional sub-arrays of an M×N sparse matrix based on a number of non-zero value elements in each single dimensional sub-array, each of the single dimensional sub-arrays arranged in a first common dimension, wherein a number of single dimensional sub-arrays in each group is not more than a number (R) of single dimensional sub-arrays to be processed in parallel by the plurality of computational units of the SIMD architecture processor; reducing a number of elements in the single dimensional sub-arrays of at least one group by dropping zero-value elements, without dropping non-zero value elements, wherein the single dimensional sub-arrays in each group have a common numb

Assignees

Inventors

Classifications

  • G06F17/16Primary

    Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10311127B2 cover?
Systems and methods for multiplying a sparse matrix by a vector using a single instruction multiple data (SIMD) architecture are provided. An example method includes sorting rows of the sparse matrix by a number of non-zero elements in the rows to generate sorted rows. The sorted rows are split to generate groups of the sorted rows. The number of rows in each group of the sorted rows is equal t…
Who is the assignee on this patent?
Knowles Electronics Llc
What technology area does this patent fall under?
Primary CPC classification G06F17/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 04 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).