What technology area does this patent fall under?

Primary CPC classification G06F17/16. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 11 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Sparse matrix multiplication acceleration mechanism

US12008067B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12008067-B2
Application number	US-202117527324-A
Country	US
Kind code	B2
Filing date	Nov 16, 2021
Priority date	Sep 5, 2019
Publication date	Jun 11, 2024
Grant date	Jun 11, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus to facilitate acceleration of matrix multiplication operations. The apparatus comprises a systolic array including matrix multiplication hardware to perform multiply-add operations on received matrix data comprising data from a plurality of input matrices and sparse matrix acceleration hardware to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce multiply-add operations to be performed by the matrix multiplication hardware.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a systolic multiplier, including: a first set of first in first out (FIFO) buffers to store data in a first input matrix; a second set of FIFO buffers to store data in a second input matrix; a plurality of processing elements (PEs) to receive the matrix data from the first set of FIFO buffers and the second set of FIFO buffers and perform multiply-add operations on first input matrix and the second input matrix; a plurality of storage elements to locally store intermediate matrix multiplication values; and sparse matrix acceleration circuitry to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce the multiply-add operations to be performed by the matrix systolic multiplier, including swapping rows with other rows in a sub-matrix for each of a plurality of sub-matrices of the first input matrix. 2. The apparatus of claim 1 , wherein the swapping comprises swapping a first row having the predetermined threshold zero values with a second row not having the predetermined threshold zero values to enable the first row to be adjacent to a third row having the predetermined threshold zero values. 3. The apparatus of claim 1 , further comprising compression circuitry to compress the matrix data. 4. The apparatus of claim 3 , wherein compressing the matrix data comprises generating packed matrix data by removing zero values from the matrix data and generating an indicator vector to identify the location of the zero values in the packed matrix data. 5. The apparatus of claim 4 , wherein the sparse matrix acceleration circuitry receives the compressed matrix data and identifies the zero values in the compressed matrix data based on the indicator vector. 6. The apparatus of claim 1 , wherein the sparse matrix acceleration circuitry optimizing the matrix data further comprises swapping rows in each of a plurality of sub-matrices of a second of the plurality of input matrices to add to results of a multiplication operation of the first matrix and a third of the plurality of input matrices. 7. The apparatus of claim 6 , wherein the sparse matrix acceleration circuitry optimizing the matrix data further comprises performing a reverse swapping on an output matrix. 8. The apparatus of claim 7 , wherein the sparse matrix acceleration circuitry optimizing the matrix data further comprises performing row adjustments in the first matrix. 9. The apparatus of claim 8 , wherein performing the row adjustments comprises shifting all non-zero values of a row having more than the predetermined number of zero values. 10. The apparatus of claim 8 , wherein performing the row adjustments comprises combining adjacent rows having more than the predetermined number of zero values. 11. A method comprising: receiving data in a first input matrix at a systolic multiplier; receiving data in a second input matrix at the systolic multiplier; detecting, at the systolic multiplier, zero values in the first matrix data; performing, at the systolic multiplier, one or more optimizations on the matrix data to reduce multiply-add operations to be performed, including swapping rows with other rows in a sub-matrix for each of a plurality of sub-matrices of the first input matrix; and performing, at the systolic multiplier, the multiply-add operations on the optimized matrix data at matrix multiplication circuitry. 12. The method of claim 11 , wherein the swapping comprises swapping a first row having the predetermined threshold zero values with a second row not having the predetermined threshold zero values to enable the first row to be adjacent to a third row having the predetermined threshold zero values. 13. The method of claim 11 , wherein performing the one or more optimizations on the matrix data comprises swapping rows in each of a plurality of sub-matrices of a first of the plurality of input matrices to achieve a maximum number of adjacent rows having a predetermined threshold zero values. 14. The method of claim 11 , wherein performing the one or more optimizations on the matrix data further comprises performing row adjustments in the first matrix. 15. The method of claim 14 , wherein performing the one or more optimizations on the matrix data further comprises performing a reverse swapping on an output matrix resulting from the multiply-add operations. 16. A graphics processing unit (GPU) comprising: a first set of first in first out (FIFO) buffers to store data in a first input matrix; a second set of FIFO buffers to store data in a second input matrix; a plurality of processing elements (PEs) to receive the matrix data from the first set of FIFO buffers and the second set of FIFO buffers and perform multiply-add operations on first input matrix and the second input matrix; and a plurality of storage elements to locally store intermediate matrix multiplication values; and sparse matrix acceleration circuitry to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce multiply-add operations to be performed by the matrix systolic multiplier, including swapping rows with other rows in a sub-matrix for each of a plurality of sub-matrices of the first input matrix. 17. The GPU of claim 16 , wherein the swapping comprises swapping a first row having the predetermined threshold zero values with a second row not having the predetermined threshold zero values to enable the first row to be adjacent to a third row having the predetermined threshold zero values. 18. The GPU of claim 16 , wherein performing the one or more optimizations on the matrix data comprises swapping rows in each of a plurality of sub-matrices of a first of the plurality of input matrices to achieve a maximum number of adjacent rows having a predetermined threshold zero values. 19. The GPU of claim 16 , wherein performing the one or more optimizations on the matrix data further comprises performing row adjustments in the first matrix. 20. The GPU of claim 19 , wherein performing the one or more optimizations on the matrix data further comprises performing a reverse swapping on an output matrix resulting from the multiply-add operations.

Assignees

Intel Corp

Inventors

Classifications

G06N3/0495
Quantised networks; Sparse networks; Compressed networks · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06F9/30036
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
G06F2207/3892
Systolic array · CPC title
G06F13/1673
using buffers · CPC title

Patent family

Related publications grouped by family.

View patent family 70738381

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12008067B2 cover?: An apparatus to facilitate acceleration of matrix multiplication operations. The apparatus comprises a systolic array including matrix multiplication hardware to perform multiply-add operations on received matrix data comprising data from a plurality of input matrices and sparse matrix acceleration hardware to detect zero values in the matrix data and perform one or more optimizations on the ma…
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification G06F17/16. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 11 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).