Hardware node with matrix-vector multiply tiles for neural network processing

US10140252B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10140252-B2
Application numberUS-201715637608-A
CountryUS
Kind codeB2
Filing dateJun 29, 2017
Priority dateFeb 28, 2017
Publication dateNov 27, 2018
Grant dateNov 27, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Hardware and methods for neural network processing are provided. A method in a system comprising a plurality of nodes, where each node comprises a plurality of tiles, is provided. The method includes receiving an N by M matrix of coefficients configured to control a neural network model. The method includes storing a first row and a second row of the N by M matrix of coefficients in a first and a second on-chip memories incorporated within a first and a second of the plurality of tiles. The method includes processing the first row of the coefficients and a first set of input vectors using a first compute unit incorporated within the first of the plurality of tiles. The method includes processing the second row of the coefficients and a second set of input vectors using a second compute unit incorporated within the second of the plurality of tiles.

First claim

Opening claim text (preview).

What is claimed: 1. A method for evaluating a neural network model in a system comprising a plurality of nodes interconnected via a network, wherein each node comprises a plurality of tiles, the method comprising: receiving an N by M matrix of coefficients via an ingress tree, wherein the N by M matrix of coefficients is configured to control the neural network model, wherein N is an integer equal to or greater than 8 and M is an integer equal to or greater than 8; storing a first row of the N by M matrix of coefficients in a first on-chip memory incorporated within a first of the plurality of tiles and storing a second row of the N by M matrix of coefficients in a second on-chip memory incorporated within a second of the plurality of tiles; processing the first row of the N by M matrix of coefficients and a first set of input vectors, received via the ingress tree, using a first compute unit incorporated within the first of the plurality of tiles; and processing the second row of the N by M matrix of coefficients and a second set of input vectors, received via the ingress tree, using a second compute unit incorporated within the second of the plurality of tiles. 2. The method of claim 1 , wherein the processing the first row further comprises performing a first point-wise dot product operation on the first row of the N by M matrix of coefficients and the first set of input vectors. 3. The method of claim 2 further comprising outputting a first set of output values generated by the first point-wise dot product operation via an egress tree coupled to each one of the plurality of tiles. 4. The method of claim 1 , wherein the processing the second row further comprises performing a second point-wise dot product operation on the second row of the N by M matrix of coefficients and the second set of input vectors. 5. The method of claim 4 further comprising outputting a second set of output values generated by the second point-wise dot product operation via an egress tree coupled to each one of the plurality of tiles. 6. The method of claim 1 , wherein the N by M matrix of coefficients comprises weights corresponding to the neural network model. 7. The method of claim 1 , wherein each of the first set of input vectors and the second set of input vectors comprises runtime values of input vectors and past values of input vectors. 8. A hardware node including a plurality of tiles, the hardware node comprising: an ingress tree configured to receive an N by M matrix of coefficients, wherein the N by M matrix of coefficients is configured to control a neural network model, wherein N is an integer equal to or greater than 8 and M is an integer equal to or greater than 8; a first on-chip memory incorporated within a first of the plurality of tiles configured to store a first row of the N by M matrix of coefficients; a second on-chip memory incorporated within a second of the plurality of tiles configured to store a second row of the N by M matrix of coefficients; a first compute unit incorporated within the first of the plurality of tiles configured to process the first row of N by M matrix of coefficients and a first set of input vectors received via the ingress tree; and a second compute unit incorporated within the second of the plurality of tiles configured to process the second row of the N by M matrix of coefficients and a second set of input vectors received via the ingress tree. 9. The hardware node of claim 8 , wherein the first compute unit is further configured to perform a first point-wise dot product operation on the first row of the N by M matrix of coefficients and the first set of input vectors. 10. The hardware node of claim 9 further comprising an egress tree coupled to each one of the plurality of trees and further configured to output a first set of output values generated by the first point-wise dot product operation. 11. The hardware node of claim 8 , wherein the second compute unit is further configured to perform a second point-wise dot product operation on the second row of the N by M matrix of coefficients and the second set of input vectors. 12. The hardware node of claim 11 further comprising an egress tree coupled to each one of the plurality of trees and further configured to output a second set of output values generated by the second point-wise dot product operation. 13. The hardware node of claim 8 , wherein the N by M matrix of coefficients comprises weights corresponding to the neural network model. 14. The hardware node of claim 8 , wherein each of the first set of input vectors and the second set of input vectors comprises both runtime values of input vectors and past values of input vectors. 15. A hardware node including a plurality of tiles, the hardware node comprising: an ingress tree configured to receive an N by M matrix of coefficients, wherein the N by M matrix of coefficients is configured to control a neural network model, wherein N is an integer equal to or greater than 8 and M is an integer equal to or greater than 8, and wherein the ingress tree comprises a first ingress tree register that fans out to a second ingress tree register and a third ingress tree register; a first on-chip memory incorporated within a first of the plurality of tiles configured to store a first row of the N by M matrix of coefficients; a second on-chip memory incorporated within a second of the plurality of tiles configured to store a second row of the N by M matrix of coefficients; a first compute unit incorporated within the first of the plurality of tiles configured to process the first row of N by M matrix of coefficients and a first set of input vectors received via the ingress tree; and a second compute unit incorporated within the second of the plurality of tiles configured to process the second row of the N by M matrix of coefficients and a second set of input vectors received via the ingress tree. 16. The hardware node of claim 15 , wherein the first compute unit is further configured to perform a first point-wise dot product operation on the first row of the N by M matrix of coefficients and the first set of input vectors. 17. The hardware node of claim 15 , wherein the second compute unit is further configured to perform a second point-wise dot product operation on the second row of the N by M matrix of coefficients and the second set of input vectors. 18. The hardware node of claim 17 further comprising an egress tree coupled to each one of the plurality of trees and further configured to output a first set of output values generated by the first point-wise dot product operation. 19. The hardware node of claim 17 further comprising an egress tree coupled to each one of the plurality of trees and further configured to output a second set of output values generated by the second point-wise dot product operation. 20. The hardware node of claim 15 , wherein the N by M matrix of coefficients comprises weights corresponding to the neural network model.

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Activation functions · CPC title

  • using instruction pipelines · CPC title

  • Learning methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10140252B2 cover?
Hardware and methods for neural network processing are provided. A method in a system comprising a plurality of nodes, where each node comprises a plurality of tiles, is provided. The method includes receiving an N by M matrix of coefficients configured to control a neural network model. The method includes storing a first row and a second row of the N by M matrix of coefficients in a first and…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F17/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 27 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).