Sparse matrix multiplication in associative memory device
US-2018210862-A1 · Jul 26, 2018 · US
US10509846B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10509846-B2 |
| Application number | US-201715840552-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 13, 2017 |
| Priority date | Dec 13, 2017 |
| Publication date | Dec 17, 2019 |
| Grant date | Dec 17, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An accelerator for increasing the processing speed of a processor. The accelerator operates in two distinct modes. In a first mode for dense layer processing, row data sets and column data sets are sent to a multiplier for multiplication. In a second mode for sparse layer processing compressed row data sets are received by a row multiplexer and compressed column data sets are received by a column multiplexer. Each multiplexer is configured to compare the indexes of data sets with one another to determine matching indexes. When indexes match, the matching data sets are selected and sent to the multiplier for multiplication. When indexes do not match, data sets are stored in memory devices for subsequent cycles.
Opening claim text (preview).
What is claimed: 1. An apparatus for accelerating processing of one or more processors, the apparatus comprising: at least one processing element having a multiplier that receives row data sets and column data sets while operating in a first mode; at least one row multiplexer and at least one row memory device coupled to the at least one processing element for selecting a row data set received by the multiplier based on matching a row data set index with a column data set index while operating in a second mode; and at least one column multiplexer and at least one column memory device coupled to the at least one processing element for selecting a column data set received by the multiplier based on matching a column data set index with a row data set index while operating in the second mode. 2. The apparatus of claim 1 , wherein the at least one row memory device receives the row data set, and after receiving the row data set, the at least one row memory device bypasses the row data set through the at least one row memory device while operating in the second mode. 3. The apparatus of claim 1 , wherein the at least one processing element receives the row data set, and after receiving the row data set, the at least one processing element prevents transmission of additional row data sets to the at least one row memory device while operating in the second mode. 4. The apparatus of claim 1 , wherein the at least one row memory device receives the row data set, and after receiving the row data set, the at least one row memory device stores the row data set in the at least one row memory device while operating in the second mode. 5. The apparatus of claim 1 , wherein the at least one row multiplexer comprises a first row multiplexer coupled to the at least one processing element and a second row multiplexer coupled to the at least one processing element. 6. The apparatus of claim 5 , wherein the at least one row memory device comprises a first row memory device coupled to the first row multiplexer and the at least one processing element, and a second row memory device coupled to the second row multiplexer and the at least one processing element. 7. The apparatus of claim 6 , wherein the at least one column multiplexer comprises a first column multiplexer coupled to the at least one processing element, and a second column multiplexer coupled to the at least one processing element. 8. The apparatus of claim 7 , wherein the at least one column memory device comprises a first column memory device coupled to the first column multiplexer and the at least one processing element, and a second column memory device coupled to the second column multiplexer and the at least one processing element. 9. The apparatus of claim 1 , wherein prior to the multiplier receiving the selected row data set and selected column data set, the at least one processing element compares the row data set index of the selected row data set to the column data set index of the selected column data set while operating in the second mode. 10. The apparatus of claim 9 , wherein when the compared row data set index and column data set index do not match, the at least one processing element discards one of the selected row data set or selected column data set while operating in the second mode. 11. The apparatus of claim 10 , wherein the at least one processing element discards one of the selected row data set or selected column data set based on the row data set index or the column data set index. 12. The apparatus of claim 1 , wherein the at least one processing element is one of a plurality of processing elements. 13. The apparatus of claim 1 , wherein the at least one processor is a processor of a neural network. 14. The apparatus of claim 1 wherein the at least one processing element has an accumulator that is coupled to the multiplier. 15. The apparatus of claim 1 wherein the at least one row memory device is a first in and first out memory device. 16. At least one non-transitory machine readable medium including instructions for increasing processing speed, the instructions, when executed by a machine, cause the machine to perform operations comprising: operate a processing element in a first mode, wherein operating in the first mode comprises: receiving row data sets and transmitting the row data sets to a multiplier of the processing element; and receiving column data sets and transmitting the column data sets to the multiplier; and operate the processing element in a second mode, wherein operating in the second mode comprises: using at least one row multiplexer and at least one row memory coupled to the processing element to select a row data set received by the multiplier based on matching a row data set index with a column data set index; and using at least one row multiplexer and at least one row memory coupled to the processing element to select a column data set received by the multiplier based on matching a column data set index with a row data set index. 17. The at least one non-transitory machine readable medium of claim 16 , wherein the processing element in the second mode further operates to: receive the row data set at a row memory device coupled to the processing element; and bypass the row data set through the row memory device. 18. The at least one non-transitory machine readable medium of claim 16 , wherein the processing element in the second mode further operates to: receive the row data set at a row memory device coupled to the processing element; and prevent transmission of additional row data sets to the row memory device. 19. The at least one non-transitory machine readable medium of claim 16 , wherein the processing element in the second mode further operates to: receive the row data set at a row memory device coupled to the processing element; and store the row data set in the row memory device. 20. The at least one non-transitory machine readable medium of claim 16 , wherein the processing element in the second mode further operates to: compare a row data set index to a column data set index that does not match; and store the row data set in a row memory device coupled to the processing element or store the column data set in a column memory device coupled to the processing element based on the comparison of the row data set index to the column data set index. 21. The at least one non-transitory machine readable medium of claim 16 , wherein while operating in the second mode, selecting the row data set and selecting the column data set are further based on a read pointer reference. 22. The at least one non-transitory machine readable medium of claim 16 , wherein the processing element is one of a plurality of processing elements operated by the non-transitory, machine readable medium. 23. A method of operating a processing element of a hardware accelerator, the method comprising: operating a processing element of an apparatus in a first mode, wherein operating in the first mode comprises: receiving row data sets and transmitting the row data sets to a multiplier of the processing element; and receiving column data sets and transmitting the column data sets to the multiplier; and operating in a second mode, wherein operating in the second triode comprises: using at least one row multiplexer and at least one row memory, of the apparatus, coupled to the processing element to select a row data set received by the multiplier based on matching a row da
Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title
using electronic means · CPC title
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
to perform operations on data operands · CPC title
Neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.