In memory matrix multiplication and its usage in neural networks
US-2017277659-A1 · Sep 28, 2017 · US
US10140252B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10140252-B2 |
| Application number | US-201715637608-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 29, 2017 |
| Priority date | Feb 28, 2017 |
| Publication date | Nov 27, 2018 |
| Grant date | Nov 27, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Hardware and methods for neural network processing are provided. A method in a system comprising a plurality of nodes, where each node comprises a plurality of tiles, is provided. The method includes receiving an N by M matrix of coefficients configured to control a neural network model. The method includes storing a first row and a second row of the N by M matrix of coefficients in a first and a second on-chip memories incorporated within a first and a second of the plurality of tiles. The method includes processing the first row of the coefficients and a first set of input vectors using a first compute unit incorporated within the first of the plurality of tiles. The method includes processing the second row of the coefficients and a second set of input vectors using a second compute unit incorporated within the second of the plurality of tiles.
Opening claim text (preview).
What is claimed: 1. A method for evaluating a neural network model in a system comprising a plurality of nodes interconnected via a network, wherein each node comprises a plurality of tiles, the method comprising: receiving an N by M matrix of coefficients via an ingress tree, wherein the N by M matrix of coefficients is configured to control the neural network model, wherein N is an integer equal to or greater than 8 and M is an integer equal to or greater than 8; storing a first row of the N by M matrix of coefficients in a first on-chip memory incorporated within a first of the plurality of tiles and storing a second row of the N by M matrix of coefficients in a second on-chip memory incorporated within a second of the plurality of tiles; processing the first row of the N by M matrix of coefficients and a first set of input vectors, received via the ingress tree, using a first compute unit incorporated within the first of the plurality of tiles; and processing the second row of the N by M matrix of coefficients and a second set of input vectors, received via the ingress tree, using a second compute unit incorporated within the second of the plurality of tiles. 2. The method of claim 1 , wherein the processing the first row further comprises performing a first point-wise dot product operation on the first row of the N by M matrix of coefficients and the first set of input vectors. 3. The method of claim 2 further comprising outputting a first set of output values generated by the first point-wise dot product operation via an egress tree coupled to each one of the plurality of tiles. 4. The method of claim 1 , wherein the processing the second row further comprises performing a second point-wise dot product operation on the second row of the N by M matrix of coefficients and the second set of input vectors. 5. The method of claim 4 further comprising outputting a second set of output values generated by the second point-wise dot product operation via an egress tree coupled to each one of the plurality of tiles. 6. The method of claim 1 , wherein the N by M matrix of coefficients comprises weights corresponding to the neural network model. 7. The method of claim 1 , wherein each of the first set of input vectors and the second set of input vectors comprises runtime values of input vectors and past values of input vectors. 8. A hardware node including a plurality of tiles, the hardware node comprising: an ingress tree configured to receive an N by M matrix of coefficients, wherein the N by M matrix of coefficients is configured to control a neural network model, wherein N is an integer equal to or greater than 8 and M is an integer equal to or greater than 8; a first on-chip memory incorporated within a first of the plurality of tiles configured to store a first row of the N by M matrix of coefficients; a second on-chip memory incorporated within a second of the plurality of tiles configured to store a second row of the N by M matrix of coefficients; a first compute unit incorporated within the first of the plurality of tiles configured to process the first row of N by M matrix of coefficients and a first set of input vectors received via the ingress tree; and a second compute unit incorporated within the second of the plurality of tiles configured to process the second row of the N by M matrix of coefficients and a second set of input vectors received via the ingress tree. 9. The hardware node of claim 8 , wherein the first compute unit is further configured to perform a first point-wise dot product operation on the first row of the N by M matrix of coefficients and the first set of input vectors. 10. The hardware node of claim 9 further comprising an egress tree coupled to each one of the plurality of trees and further configured to output a first set of output values generated by the first point-wise dot product operation. 11. The hardware node of claim 8 , wherein the second compute unit is further configured to perform a second point-wise dot product operation on the second row of the N by M matrix of coefficients and the second set of input vectors. 12. The hardware node of claim 11 further comprising an egress tree coupled to each one of the plurality of trees and further configured to output a second set of output values generated by the second point-wise dot product operation. 13. The hardware node of claim 8 , wherein the N by M matrix of coefficients comprises weights corresponding to the neural network model. 14. The hardware node of claim 8 , wherein each of the first set of input vectors and the second set of input vectors comprises both runtime values of input vectors and past values of input vectors. 15. A hardware node including a plurality of tiles, the hardware node comprising: an ingress tree configured to receive an N by M matrix of coefficients, wherein the N by M matrix of coefficients is configured to control a neural network model, wherein N is an integer equal to or greater than 8 and M is an integer equal to or greater than 8, and wherein the ingress tree comprises a first ingress tree register that fans out to a second ingress tree register and a third ingress tree register; a first on-chip memory incorporated within a first of the plurality of tiles configured to store a first row of the N by M matrix of coefficients; a second on-chip memory incorporated within a second of the plurality of tiles configured to store a second row of the N by M matrix of coefficients; a first compute unit incorporated within the first of the plurality of tiles configured to process the first row of N by M matrix of coefficients and a first set of input vectors received via the ingress tree; and a second compute unit incorporated within the second of the plurality of tiles configured to process the second row of the N by M matrix of coefficients and a second set of input vectors received via the ingress tree. 16. The hardware node of claim 15 , wherein the first compute unit is further configured to perform a first point-wise dot product operation on the first row of the N by M matrix of coefficients and the first set of input vectors. 17. The hardware node of claim 15 , wherein the second compute unit is further configured to perform a second point-wise dot product operation on the second row of the N by M matrix of coefficients and the second set of input vectors. 18. The hardware node of claim 17 further comprising an egress tree coupled to each one of the plurality of trees and further configured to output a first set of output values generated by the first point-wise dot product operation. 19. The hardware node of claim 17 further comprising an egress tree coupled to each one of the plurality of trees and further configured to output a second set of output values generated by the second point-wise dot product operation. 20. The hardware node of claim 15 , wherein the N by M matrix of coefficients comprises weights corresponding to the neural network model.
Combinations of networks · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Activation functions · CPC title
using instruction pipelines · CPC title
Learning methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.