Systems and methods to accelerate multiplication of sparse matrices
US-2020210517-A1 · Jul 2, 2020 · US
US11669489B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11669489-B2 |
| Application number | US-202117490830-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 30, 2021 |
| Priority date | Sep 30, 2021 |
| Publication date | Jun 6, 2023 |
| Grant date | Jun 6, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A systolic array can be configured to skip distributed operands that have zero-values, resulting in improved resource efficiency. A skip module is introduced to receive operands from memory, identify whether they have a zero value or not, and, if they are nonzero, generate an operand vector including an index before sending the operand vector to a processing element.
Opening claim text (preview).
What is claimed is: 1. A system, comprising: a first row of processing elements; a memory; and a skip module, the skip module configured to: receive a sequence of operands from the memory, the sequence including at least a first operand and a second operand; generate a first operand vector based on an identification that the first operand is a nonzero operand; skip the second operand based on an identification that the second operand is a zero-value operand; and send the first operand vector to each processing element included in the first row of processing elements. 2. The system of claim 1 , wherein a first processing element included in the first row of processing elements is configured to: receive the first operand vector from the skip module; identify the first operand from the first operand vector; receive a third operand; and perform an operation using the first operand and the third operand. 3. The system of claim 2 , wherein the operation includes a multiply-accumulate (MAC) operation. 4. The system of claim 3 , wherein the first processing element is a 3-way MAC unit. 5. The system of claim 2 , further comprising a first register, wherein the first processing element is further configured to store a result of the operation in the first register. 6. The system of claim 5 , further comprising: a second register; an operand register; and a second row of processing elements, wherein a second processing element included in the second row of processing elements is configured to: receive a value from the first register; store the value in the second register; receive a second operand vector; identify a fourth operand from the second operand vector; receive a fifth operand from the operand register; and perform a second operation using the fourth operand and the fifth operand. 7. The system of claim 5 , wherein the second processing element is further configured to add a second result of the second operation to the value stored in the second register. 8. The system of claim 2 , wherein: the performing the operation requires a first number of cycles; the processing elements are configured to execute via a second number of threads; and the second number is greater than the first number. 9. The system of claim 1 , wherein sequence of operands includes a third operand and wherein the skip module is further configured to: generate a second operand vector based on an identification that the third operand is a nonzero operand; determine a first index for the first operand vector based on a first position of the first operand within the sequence, wherein the first operand vector includes the first index; and determine a second index for the second operand vector based on a second position of the third operand within the sequence, wherein the second operand vector includes the second index. 10. The system of claim 9 , wherein a processing element included in the first row of processing elements is configured to: receive the second operand vector from the skip module; identify the third operand from the second operand vector; identify the second index from the second operand vector; retrieve a fourth operand from an operand register based on the second index; and perform an operation using the third operand and the fourth operand. 11. The system of claim 1 , further comprising a second row of processing elements, wherein the skip module is further configured to: identify a first number of nonzero operands for use by the first row of processing elements; identify a second number of nonzero operands for use by the second row of processing elements; and redistribute nonzero operands amongst the first row and the second row based on the first number and the second number. 12. The system of claim 1 , wherein the skip module further includes a plurality of multiplexers configured to enable the skip module to select sequential nonzero operands across a range of operands. 13. A skip module apparatus, the skip module apparatus configured to: receive a sequence of operands from a memory, the sequence including at least a first operand a second operand; generate a first operand vector based on an identification that the first operand is a nonzero operand; skip the second operand based on an identification that the second operand is a zero-value operand; and send the first operand vector to each processing element included in a first row of processing elements. 14. The skip module apparatus of claim 13 , further configured to: determine a first index for the first operand vector based on a first position of the first operand within the sequence, wherein the first operand vector includes the first index. 15. The skip module apparatus of claim 13 , wherein the skip module is further configured to: identify a first number of nonzero operands for use by the first row of processing elements; identify a second number of nonzero operands for use by a second row of processing elements; and redistribute nonzero operands amongst the first row and the second row based on the first number and the second number. 16. A method, comprising: receiving a sequence of operands from a memory, the sequence including at least a first operand and a second operand; generating a first operand vector based on an identification that the first operand is a nonzero operand; skipping the second operand based on an identification that the second operand is a zero-value operand; and sending the first operand vector to each processing element included in a first row of processing elements. 17. The method of claim 16 , further comprising: determining a first index for the first operand vector based on a first position of the first operand within the sequence, wherein the first operand vector includes the first index. 18. The method of claim 16 , further comprising: identifying a first number of nonzero operands for use by the first row of processing elements; identifying a second number of nonzero operands for use by a second row of processing elements; and redistributing nonzero operands amongst the first row and the second row based on the first number and the second number. 19. The skip module apparatus of claim 13 , wherein the sequence of operands includes a third operand and wherein the skip module apparatus is further configured to: generate a second operand vector based on an identification that the third operand is a nonzero operand; and send the second operand vector to each processing elements included in the first row of processing elements. 20. The method of claim 16 , wherein the sequence includes a third operand and the method further comprises: generating a second operand vector based on an identification that the third operand is a nonzero operand; and sending the second operand vector to each processing elements included in the first row of processing elements.
Adding; Subtracting (G06F7/483 - G06F7/491, G06F7/544 - G06F7/556 take precedence) · CPC title
Systolic arrays · CPC title
Arithmetic instructions · CPC title
Multiplying only · CPC title
Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.