Method and device for matrix multiplication optimization using vector registers
US-11366875-B2 · Jun 21, 2022 · US
US11657252B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11657252-B2 |
| Application number | US-201916434960-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 7, 2019 |
| Priority date | Jun 7, 2019 |
| Publication date | May 23, 2023 |
| Grant date | May 23, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A microprocessor system comprises a first processing element, a second processing element, a point-to-point connection between the first processing element and the second processing element, and a communication bus connecting together at least the first processing element and the second processing element. The first processing element includes a first matrix computing unit and the second processing element includes a second matrix computing unit. The point-to-point connection is configured to provide at least a result of the first processing element to a data joiner component of the second processing element configured to join at least the provided result of the first processing element with a result of the second matrix computing unit.
Opening claim text (preview).
What is claimed is: 1. A microprocessor system, comprising: a first processing element including a first matrix computing unit; a second processing element including a second matrix computing unit; a point-to-point connection between the first processing element and the second processing element, wherein the point-to-point connection is configured to provide at least a result of the first processing element to a data joiner component of the second processing element configured to join at least the provided result of the first processing element with a result of the second matrix computing unit, wherein the data joiner component includes an adder and a multiplexer and the multiplexer is configured to receive the result of the second matrix computing unit; and a communication bus connecting together at least the first processing element and the second processing element. 2. The system of claim 1 , wherein the multiplexer is configured to shift the result of the second matrix computing unit by a configured result offset. 3. The system of claim 2 , wherein the configured result offset is a 0-byte, 8-byte, 16-byte, or 24-byte offset. 4. The system of claim 2 , wherein the configured result offset is specified by a processing element instruction. 5. The system of claim 4 , wherein the processing element instruction is a convolution operation instruction. 6. The system of claim 4 , wherein the second processing element is configured to receive the processing element instruction via the communication bus. 7. The system of claim 2 , wherein the adder is configured to receive the result of the first processing element and the shifted result of the second matrix computing unit. 8. The system of claim 7 , wherein the adder is configured to add together the result of the first processing element and the shifted result of the second matrix computing unit to output a packed result. 9. The system of claim 8 , wherein the packed result is a size of a cache-line. 10. The system of claim 8 , further comprising a second point-to-point connection configured to send the packed result to a third processing element, and wherein the second point-to-point connection connects the second matrix computing unit to the third processing element. 11. The system of claim 10 , wherein the third processing element includes a second data joiner component and the second data joiner component is connected to the second point-to-point connection. 12. The system of claim 8 , wherein the packed result includes a plurality of matrix compute results, and each matrix compute result of the plurality of matrix compute results is determined using a different processing element. 13. The system of claim 1 , wherein the microprocessor system is included in an integrated circuit chip. 14. A method, comprising: determining a processing result using a first processing element, wherein the first processing element includes a first matrix computing unit; providing the processing result of the first processing element to a data joiner component of a second matrix computing unit via a first point-to-point connection; determining a result of the second matrix computing unit; providing the result of the second matrix computing unit to the data joiner component of the second matrix computing unit, wherein the data joiner component includes an adder and a multiplexer and the multiplexer is configured to receive the result of the second matrix computing unit; joining the processing result of the first processing element and the result of the second matrix computing unit to create a packed result; and sending the packed result to a third processing element via a second point-to-point connection. 15. The method of claim 14 , wherein the multiplexer is configured to shift the result of the second matrix computing unit by a configured result offset. 16. The method of claim 15 , wherein the configured result offset is specified by a processing element instruction. 17. The method of claim 16 , wherein the processing element instruction includes a convolution operation instruction. 18. The method of claim 14 , wherein the processing result of the first matrix computing unit and the result of the second matrix computing unit are byte-aligned in the packed result. 19. The method of claim 14 , wherein the packed result includes a plurality of matrix compute results, and each matrix compute result of the plurality of matrix compute results is determined using a different processing element. 20. A microprocessor system, comprising: a first processing element including a first matrix computing unit and a first data joiner component; a second processing element including a second matrix computing unit and a second data joiner component; a third processing element including a third matrix computing unit and a third data joiner component; a first point-to-point connection between the first data joiner component of the first processing element and the second data joiner component of the second processing element, wherein the first point-to-point connection is configured to provide at least a first output result of the first data joiner component to the second data joiner component, and wherein the second data joiner component is configured to output a second output result by combining at least the first output result with a matrix compute result of the second matrix computing unit; a second point-to-point connection between the second data joiner component of the second processing element and the third data joiner component of the third processing element, wherein the second point-to-point connection is configured to provide at least the second output result of the second data joiner component to the third data joiner component; and a communication bus connecting together at least the first processing element, the second processing element, and the third processing element.
for complex operations, e.g. multidimensional or interleaved address generators, macros · CPC title
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
in which an application is distributed across nodes in the network (software deployment G06F8/60; multiprogramming arrangements G06F9/46) · CPC title
Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs (mappping at compile time, see G06F8/451) · CPC title
Neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.