Sparse matrix multiplication acceleration mechanism

US12008067B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12008067-B2
Application numberUS-202117527324-A
CountryUS
Kind codeB2
Filing dateNov 16, 2021
Priority dateSep 5, 2019
Publication dateJun 11, 2024
Grant dateJun 11, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus to facilitate acceleration of matrix multiplication operations. The apparatus comprises a systolic array including matrix multiplication hardware to perform multiply-add operations on received matrix data comprising data from a plurality of input matrices and sparse matrix acceleration hardware to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce multiply-add operations to be performed by the matrix multiplication hardware.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a systolic multiplier, including: a first set of first in first out (FIFO) buffers to store data in a first input matrix; a second set of FIFO buffers to store data in a second input matrix; a plurality of processing elements (PEs) to receive the matrix data from the first set of FIFO buffers and the second set of FIFO buffers and perform multiply-add operations on first input matrix and the second input matrix; a plurality of storage elements to locally store intermediate matrix multiplication values; and sparse matrix acceleration circuitry to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce the multiply-add operations to be performed by the matrix systolic multiplier, including swapping rows with other rows in a sub-matrix for each of a plurality of sub-matrices of the first input matrix. 2. The apparatus of claim 1 , wherein the swapping comprises swapping a first row having the predetermined threshold zero values with a second row not having the predetermined threshold zero values to enable the first row to be adjacent to a third row having the predetermined threshold zero values. 3. The apparatus of claim 1 , further comprising compression circuitry to compress the matrix data. 4. The apparatus of claim 3 , wherein compressing the matrix data comprises generating packed matrix data by removing zero values from the matrix data and generating an indicator vector to identify the location of the zero values in the packed matrix data. 5. The apparatus of claim 4 , wherein the sparse matrix acceleration circuitry receives the compressed matrix data and identifies the zero values in the compressed matrix data based on the indicator vector. 6. The apparatus of claim 1 , wherein the sparse matrix acceleration circuitry optimizing the matrix data further comprises swapping rows in each of a plurality of sub-matrices of a second of the plurality of input matrices to add to results of a multiplication operation of the first matrix and a third of the plurality of input matrices. 7. The apparatus of claim 6 , wherein the sparse matrix acceleration circuitry optimizing the matrix data further comprises performing a reverse swapping on an output matrix. 8. The apparatus of claim 7 , wherein the sparse matrix acceleration circuitry optimizing the matrix data further comprises performing row adjustments in the first matrix. 9. The apparatus of claim 8 , wherein performing the row adjustments comprises shifting all non-zero values of a row having more than the predetermined number of zero values. 10. The apparatus of claim 8 , wherein performing the row adjustments comprises combining adjacent rows having more than the predetermined number of zero values. 11. A method comprising: receiving data in a first input matrix at a systolic multiplier; receiving data in a second input matrix at the systolic multiplier; detecting, at the systolic multiplier, zero values in the first matrix data; performing, at the systolic multiplier, one or more optimizations on the matrix data to reduce multiply-add operations to be performed, including swapping rows with other rows in a sub-matrix for each of a plurality of sub-matrices of the first input matrix; and performing, at the systolic multiplier, the multiply-add operations on the optimized matrix data at matrix multiplication circuitry. 12. The method of claim 11 , wherein the swapping comprises swapping a first row having the predetermined threshold zero values with a second row not having the predetermined threshold zero values to enable the first row to be adjacent to a third row having the predetermined threshold zero values. 13. The method of claim 11 , wherein performing the one or more optimizations on the matrix data comprises swapping rows in each of a plurality of sub-matrices of a first of the plurality of input matrices to achieve a maximum number of adjacent rows having a predetermined threshold zero values. 14. The method of claim 11 , wherein performing the one or more optimizations on the matrix data further comprises performing row adjustments in the first matrix. 15. The method of claim 14 , wherein performing the one or more optimizations on the matrix data further comprises performing a reverse swapping on an output matrix resulting from the multiply-add operations. 16. A graphics processing unit (GPU) comprising: a first set of first in first out (FIFO) buffers to store data in a first input matrix; a second set of FIFO buffers to store data in a second input matrix; a plurality of processing elements (PEs) to receive the matrix data from the first set of FIFO buffers and the second set of FIFO buffers and perform multiply-add operations on first input matrix and the second input matrix; and a plurality of storage elements to locally store intermediate matrix multiplication values; and sparse matrix acceleration circuitry to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce multiply-add operations to be performed by the matrix systolic multiplier, including swapping rows with other rows in a sub-matrix for each of a plurality of sub-matrices of the first input matrix. 17. The GPU of claim 16 , wherein the swapping comprises swapping a first row having the predetermined threshold zero values with a second row not having the predetermined threshold zero values to enable the first row to be adjacent to a third row having the predetermined threshold zero values. 18. The GPU of claim 16 , wherein performing the one or more optimizations on the matrix data comprises swapping rows in each of a plurality of sub-matrices of a first of the plurality of input matrices to achieve a maximum number of adjacent rows having a predetermined threshold zero values. 19. The GPU of claim 16 , wherein performing the one or more optimizations on the matrix data further comprises performing row adjustments in the first matrix. 20. The GPU of claim 19 , wherein performing the one or more optimizations on the matrix data further comprises performing a reverse swapping on an output matrix resulting from the multiply-add operations.

Assignees

Inventors

Classifications

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • Systolic array · CPC title

  • using buffers · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12008067B2 cover?
An apparatus to facilitate acceleration of matrix multiplication operations. The apparatus comprises a systolic array including matrix multiplication hardware to perform multiply-add operations on received matrix data comprising data from a plurality of input matrices and sparse matrix acceleration hardware to detect zero values in the matrix data and perform one or more optimizations on the ma…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F17/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 11 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).