Hardware accelerator for sparse accumulation in column-wise sparse general matrix-matrix multipliction algorithms

US2024411834A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2024411834-A1
Application numberUS-202418737758-A
CountryUS
Kind codeA1
Filing dateJun 7, 2024
Priority dateJun 8, 2023
Publication dateDec 12, 2024
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method of performing sparse accumulation in column-wise sparse general matrix-matrix multiplication (SpGEMM) algorithms. The method includes receiving a request to perform SpGEMM based on a first matrix and a second matrix. The method includes accumulating, in a hardware buffer, a hash key and an intermediate multiplication result of the first matrix and the second matrix. The method includes performing a probe search of a hardware cache to identify a match between the hash key and a partial sum associated with the first matrix and the second matrix. The method includes generating, by a hardware adder, a multiplication result based on the partial sum and the intermediate multiplication result from the accumulation waiting buffer.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method, comprising: receiving an instruction to perform one or more operations associated with a sparse matrix-matrix multiplication (SpGEMM) of a first matrix and a second matrix; accumulating, in a hardware buffer, a hash key and an intermediate multiplication result of the first matrix and the second matrix; performing, using the hash key, a probe search of a hardware cache to identify a partial sum associated with the first matrix and the second matrix; and generating, by a hardware adder, a multiplication result based on the partial sum and the intermediate multiplication result from the hardware buffer. 2 . The method of claim 1 , further comprising: acquiring a dataset indicative of a plurality of conflict rates for a plurality of hardware cache capacities; selecting a particular hardware cache capacity from the plurality of hardware cache capacities based on a particular conflict rate of the particular hardware cache capacity and at least one of a probe search latency, an energy efficiency, or a cache associativity; and configuring the hardware cache based on the particular hardware cache capacity. 3 . The method of claim 1 , wherein performing the probe search of the hardware cache is further performed without accessing partial sums from at least one of a software data structure or a cache of a processing device to improve a probe search latency associated with performing the probe search. 4 . The method of claim 1 , further comprising: performing a plurality of probe searches of the hardware cache for a plurality of hash keys using a parallel search procedure instead of a linear search procedure. 5 . The method of claim 1 , further comprising: storing a plurality of hash keys and a plurality of partial sums in the hardware cache. 6 . The method of claim 1 , further comprising: accumulating, in the hardware buffer, a second hash key and a second intermediate multiplication result of the first matrix and the second matrix; determining an absence of the second hash key in the hardware cache; and storing the second hash key and the second intermediate multiplication result in the hardware cache responsive to determining the absence of the second hash key in the hardware cache. 7 . The method of claim 1 , further comprising: performing, using a second hash key, a second probe search of the hardware cache; detecting a conflict event responsive to performing the second probe search of the hardware cache; and evicting, from the hardware cache, a second partial sum associated with the second hash key. 8 . The method of claim 7 , further comprising: generating, using an address generator, a virtual address for one or more insertions into a FIFO queue data structure; and inserting the second partial sum into the FIFO queue data structure based on the virtual address. 9 . The method of claim 8 , wherein inserting the second partial sum into the FIFO queue data structure based on the virtual address further comprises: generating, by a tail pointer hardware register, a next insertion address of the FIFO queue data structure; and generating, using a tail boundary hardware register, a boundary address of the FIFO queue data structure. 10 . The method of claim 1 , further comprising: receiving, via a programming interface, a plurality of key-value pairs; and selecting the hash key from the plurality of key-value pairs. 11 . An accelerating sparse accumulation (ASA) system comprising: a hardware buffer to: receive an instruction to perform one or more operations associated with a sparse matrix-matrix multiplication (SpGEMM) of a first matrix and a second matrix; and accumulate a hash key and an intermediate multiplication result of the first matrix and the second matrix; a hardware cache coupled to the hardware buffer, the hardware cache to perform, using the hash key, a probe search of the hardware cache to identify a partial sum associated with the first matrix and the second matrix; and a hardware adder coupled to the hardware cache, the hardware adder to generate a multiplication result based on the partial sum and the intermediate multiplication result from the hardware buffer. 12 . The ASA system of claim 11 , wherein to perform the probe search of the hardware cache is further performed without accessing partial sums from at least one of a software data structure or a cache of a processing device to improve a probe search latency associated with performing the probe search. 13 . The ASA system of claim 11 , wherein the hardware cache is further to: perform a plurality of probe searches of the hardware cache for a plurality of hash keys using a parallel search procedure instead of a linear search procedure. 14 . The ASA system of claim 11 , wherein the hardware cache is further to: store a plurality of hash keys and a plurality of partial sums. 15 . The ASA system of claim 11 , wherein the hardware cache is further to: accumulate a second hash key and a second intermediate multiplication result of the first matrix and the second matrix; determine an absence of the second hash key in the hardware cache; and store the second hash key and the second intermediate multiplication result responsive to determining the absence of the second hash key. 16 . The ASA system of claim 11 , wherein the hardware cache is further to: perform, using a second hash key, a second probe search of the hardware cache; detect a conflict event responsive to performing the second probe search of the hardware cache; and evict, from the hardware cache, a second partial sum associated with the second hash key. 17 . The ASA system of claim 16 , further comprising: an address generator coupled to the hardware cache, the address generator to: generate a virtual address for one or more insertions into a FIFO queue data structure; and insert the second partial sum into the FIFO queue data structure based on the virtual address. 18 . The ASA system of claim 17 , further comprising: a plurality of registers coupled to the address generator, the plurality of registers to: generate a next insertion address of the FIFO queue data structure; and generate a boundary address of the FIFO queue data structure. 19 . The ASA system of claim 11 , further comprising: a programming interface to receive a plurality of key-value pairs comprising the hash key. 20 . An integrated circuit (IC) device comprising: a hardware buffer to accumulate a hash key and an intermediate multiplication result of a first matrix and a second matrix, the intermediate multiplication result generated based on a sparse matrix-matrix multiplication (SpGEMM); a hardware cache coupled to the hardware buffer, the hardware cache to: store a plurality of hash keys and a plurality of partial sums; and perform, using the hash key, a probe search of the hardware cache to identify a partial sum associated with the first matrix and the second matrix; and a hardware adder coupled to the hardware cache, the hardware adder to generate a multiplication result based on the partial sum and the intermediate multiplication result from the hardware buffer.

Assignees

Inventors

Classifications

  • G06F17/16Primary

    Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024411834A1 cover?
A system and method of performing sparse accumulation in column-wise sparse general matrix-matrix multiplication (SpGEMM) algorithms. The method includes receiving a request to perform SpGEMM based on a first matrix and a second matrix. The method includes accumulating, in a hardware buffer, a hash key and an intermediate multiplication result of the first matrix and the second matrix. The meth…
Who is the assignee on this patent?
Univ California, Univ Lehigh
What technology area does this patent fall under?
Primary CPC classification G06F17/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 12 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).