Utilizing structured sparsity in systolic arrays

US12405787B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12405787-B2
Application numberUS-202418621539-A
CountryUS
Kind codeB2
Filing dateMar 29, 2024
Priority dateNov 30, 2020
Publication dateSep 2, 2025
Grant dateSep 2, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus to facilitate utilizing structured sparsity in systolic arrays is disclosed. The apparatus includes a processor comprising a systolic array to receive data from a plurality of source registers, the data comprising unpacked source data, structured source data that is packed based on sparsity, and metadata corresponding to the structured source data; identify portions of the unpacked source data to multiply with the structured source data, the portions of the unpacked source data identified based on the metadata; and output, to a destination register, a result of multiplication of the portions of the unpacked source data and the structured source data.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a processor comprising a systolic array to: execute an instruction for sparse systolic dot product accumulate; read at least portions of elements of a plurality of source registers referenced by the instruction, wherein the plurality of source registers comprise a first source register having metadata corresponding to structured source data, a second source register having unpacked source data, and a third source register having the structured source data packed based on sparsity as packed source data; provide a first subset of elements of the packed source data to at least one stage of the systolic array, the at least one stage comprising dot product circuitry; select, using the metadata, a second subset of elements of the unpacked source data to utilize the at least one stage of the systolic array, the second subset of elements corresponding to the first subset of elements; and perform, at the at least one stage of the systolic array, dot product accumulate operations. 2. The apparatus of claim 1 , wherein the systolic array to perform the dot product accumulate operations using the first subset of elements and the second subset of elements. 3. The apparatus of claim 1 , wherein the systolic array further comprises a plurality of multiplexor circuits to select the portions of the unpacked source data to multiply with the structured source data based on the metadata. 4. The apparatus of claim 1 , wherein the structured source data that is packed based on the sparsity comprises elements of a broadcast register of the systolic array. 5. The apparatus of claim 1 , wherein the structured source data that is packed based on the sparsity comprises elements of an index register of the systolic array. 6. The apparatus of claim 1 , wherein the systolic array to execute the instruction for sparse systolic dot product accumulate in order to identify the portions of the unpacked source data to multiply with the structured source data using the metadata and to perform a dot product multiplication of the portions with a result of multiplication of the structured source data, and wherein the metadata is provided in a source register of the plurality of source registers called by the instruction. 7. The apparatus of claim 1 , wherein the packed source data comprises at least one of a half-float datatype that packs two 16-bit elements into a channel, a bfloat datatype that packs two 16-bit elements into a channel, an int8 datatype that packs four 8-bit elements into a channel, an int4 datatype that packs eight 4-bit elements into a channel, or an int2 datatype that packs sixteen 2-bit elements into a channel. 8. The apparatus of claim 1 , wherein the metadata indicates a position of non-zero elements in an original form of the structured source data prior to packing into the structured source data. 9. The apparatus of claim 8 , wherein the original form of the structured source data is pre-processed by an external agent to pack into the structured source data by removing sparse elements from the original form, and wherein the external agent generates the metadata. 10. The apparatus of claim 9 , wherein the external agent comprises at least one of a central processing unit (CPU) or an intelligent sensor. 11. The apparatus of claim 1 , wherein the processor comprises a general-purpose graphics processing unit (GPGPU). 12. At least one non-transitory machine readable storage medium comprising instructions that, when executed, cause at least one processor to perform operations comprising: executing an instruction for sparse systolic dot product accumulate; reading at least portions of elements of a plurality of source registers referenced by the instruction, wherein the plurality of source registers comprise a first source register having metadata corresponding to structured source data, a second source register having unpacked source data, and a third source register having the structured source data packed based on sparsity as packed source data; providing a first subset of elements of the packed source data to at least one stage of a systolic array, the at least one stage comprising dot product circuitry; selecting, using the metadata, a second subset of elements of the unpacked source data to utilize the at least one stage of the systolic array, the second subset of elements corresponding to the first subset of elements; and performing, at the at least one stage of the systolic array, dot product accumulate operations. 13. The at least one non-transitory machine readable storage medium of claim 12 , wherein the systolic array to perform the dot product accumulate operations using the first subset of elements and the second subset of elements. 14. The at least one non-transitory machine readable storage medium of claim 12 , wherein the systolic array further comprises a plurality of multiplexor circuits to select the portions of the unpacked source data to multiply with the structured source data based on the metadata. 15. The at least one non-transitory machine readable storage medium of claim 12 , wherein the systolic array to execute the instruction for sparse systolic dot product accumulate in order to identify the portions of the unpacked source data to multiply with the structured source data using the metadata and to perform a dot product multiplication of the portions with a result of multiplication of the structured source data, and wherein the metadata is provided in a source register of the plurality of source registers called by the instruction. 16. The at least one non-transitory machine readable storage medium of claim 12 , wherein the metadata indicates a position of non-zero elements in an original form of the structured source data prior to packing into the structured source data. 17. A method comprising: executing by a systolic array of a processing device, an instruction for sparse systolic dot product accumulate; reading at least portions of elements of a plurality of source registers referenced by the instruction, wherein the plurality of source registers comprise a first source register having metadata corresponding to structured source data, a second source register having unpacked source data, and a third source register having the structured source data packed based on sparsity as packed source data; providing a first subset of elements of the packed source data to at least one stage of the systolic array, the at least one stage comprising dot product circuitry; selecting, using the metadata, a second subset of elements of the unpacked source data to utilize the at least one stage of the systolic array, the second subset of elements corresponding to the first subset of elements; and performing, at the at least one stage of the systolic array, dot product accumulate operations. 18. The method of claim 17 , wherein the systolic array to perform the dot product accumulate operations using the first subset of elements and the second subset of elements. 19. The method of claim 17 , wherein the systolic array further comprises a plurality of multiplexor circuits to select the portions of the unpacked source data to multiply with the structured source data based on the metadata. 20. The method of claim 17 , wherein the systolic array to execute the instruction for sparse systolic dot product accumulate in order to identify the portions of the unpacked source data to multiply with the structured source data using the metadata and to perform a dot product multi

Assignees

Inventors

Classifications

  • G06F9/3001Primary

    Arithmetic instructions · CPC title

  • controlled in tandem, e.g. multiplier-accumulator · CPC title

  • Systolic arrays · CPC title

  • Special purpose registers · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12405787B2 cover?
An apparatus to facilitate utilizing structured sparsity in systolic arrays is disclosed. The apparatus includes a processor comprising a systolic array to receive data from a plurality of source registers, the data comprising unpacked source data, structured source data that is packed based on sparsity, and metadata corresponding to the structured source data; identify portions of the unpacked…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/3001. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 02 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).