Distributed matrix multiplication for neural networks

US10922380B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10922380-B2
Application numberUS-201816236955-A
CountryUS
Kind codeB2
Filing dateDec 31, 2018
Priority dateDec 30, 2016
Publication dateFeb 16, 2021
Grant dateFeb 16, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, a matrix operation associated with a plurality of input matrices may be performed. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

First claim

Opening claim text (preview).

What is claimed is: 1. A matrix processor, comprising: a memory to store a plurality of input matrices; a plurality of matrix processing units (MPUs) to perform matrix multiplication arithmetic; controller circuitry to: receive an instruction to be executed by the matrix processor, wherein the instruction instructs the matrix processor to perform a matrix multiplication operation on the plurality of input matrices; partition the plurality of input matrices into a plurality of input partitions based on a number of available MPUs; distribute the plurality of input partitions among the plurality of MPUs, wherein each input partition is distributed to a particular MPU of the plurality of MPUs; perform a plurality of partial matrix multiplication calculations using the plurality of MPUs; transmit partial matrix data between the plurality of MPUs while performing the plurality of partial matrix multiplication calculations, wherein each MPU is to transmit a portion of the partial matrix data to one or more of the plurality of MPUs simultaneously while each of the plurality of partial matrix multiplication calculations is being performed; and determine a result of the matrix multiplication operation based on the plurality of partial matrix multiplication calculations. 2. The matrix processor of claim 1 , wherein: the plurality of MPUs is configured in a cyclic arrangement such that each MPU is communicatively coupled to a plurality of neighbor MPUs; and the plurality of neighbor MPUs of each MPU comprises a first neighbor MPU and a second neighbor MPU. 3. The matrix processor of claim 2 , wherein the controller circuitry is further to: perform the plurality of partial matrix multiplication calculations in a plurality of stages; and transmit a portion of the partial matrix data from each MPU to one or more of the plurality of neighbor MPUs while performing each stage of the plurality of partial matrix multiplication calculations. 4. The matrix processor of claim 3 , wherein the controller circuitry to transmit the portion of the partial matrix data from each MPU to one or more of the plurality of neighbor MPUs while performing each stage of the plurality of partial matrix multiplication calculations is further to: transmit the portion of the partial matrix data from each MPU to the first neighbor MPU and the second neighbor MPU. 5. The matrix processor of claim 4 , wherein the partial matrix data comprises a partial input matrix, wherein the partial input matrix is to be used by a first MPU in a particular stage of the plurality of partial matrix multiplication calculations, and wherein the partial input matrix is to be used by a second MPU in a subsequent stage of the plurality of partial matrix multiplication calculations. 6. The matrix processor of claim 5 , wherein the matrix multiplication operation is associated with a forward propagation operation in a neural network. 7. The matrix processor of claim 5 , wherein the matrix multiplication operation is associated with a weight update operation in a neural network. 8. The matrix processor of claim 3 , wherein the partial matrix data comprises a partial result matrix determined by a first MPU in a particular stage of the plurality of partial matrix multiplication calculations, and wherein the partial result matrix is to be used by a second MPU in a subsequent stage of the plurality of partial matrix multiplication calculations. 9. The matrix processor of claim 8 , wherein the matrix multiplication operation is associated with a backward propagation operation in a neural network. 10. At least one non-transitory machine accessible storage medium having instructions stored thereon, wherein the instructions, when executed on a matrix processor, cause the matrix processor to: receive, from a host processor, a request to perform a matrix multiplication operation on a plurality of input matrices; partition the plurality of input matrices into a plurality of input partitions based on a number of available matrix processing units (MPUs) in the matrix processor; distribute the plurality of input partitions among a plurality of MPUs in the matrix processor, wherein each input partition is distributed to a particular MPU of the plurality of MPUs; perform a plurality of partial matrix multiplication calculations using the plurality of MPUs; transmit partial matrix data between the plurality of MPUs while performing the plurality of partial matrix multiplication calculations, wherein each MPU is to transmit a portion of the partial matrix data to one or more of the plurality of MPUs simultaneously while each of the plurality of partial matrix multiplication calculations is being performed; and determine a result of the matrix multiplication operation based on the plurality of partial matrix multiplication calculations. 11. The storage medium of claim 10 , wherein: the plurality of MPUs is configured in a cyclic arrangement such that each MPU is communicatively coupled to a plurality of neighbor MPUs; and the plurality of neighbor MPUs of each MPU comprises a first neighbor MPU and a second neighbor MPU. 12. The storage medium of claim 11 , wherein the instructions further cause the matrix processor to: perform the plurality of partial matrix multiplication calculations in a plurality of stages; and transmit a portion of the partial matrix data from each MPU to one or more of the plurality of neighbor MPUs while performing each stage of the plurality of partial matrix multiplication calculations. 13. The storage medium of claim 12 , wherein the instructions that cause the matrix processor to transmit the portion of the partial matrix data from each MPU to one or more of the plurality of neighbor MPUs while performing each stage of the plurality of partial matrix multiplication calculations further cause the matrix processor to: transmit the portion of the partial matrix data from each MPU to the first neighbor MPU and the second neighbor MPU. 14. The storage medium of claim 13 , wherein the partial matrix data comprises a partial input matrix, wherein the partial input matrix is to be used by a first MPU in a particular stage of the plurality of partial matrix multiplication calculations, and wherein the partial input matrix is to be used by a second MPU in a subsequent stage of the plurality of partial matrix multiplication calculations. 15. The storage medium of claim 14 , wherein the matrix multiplication operation is associated with a forward propagation operation in a neural network. 16. The storage medium of claim 14 , wherein the matrix multiplication operation is associated with a weight update operation in a neural network. 17. The storage medium of claim 12 , wherein the partial matrix data comprises a partial result matrix determined by a first MPU in a particular stage of the plurality of partial matrix multiplication calculations, and wherein the partial result matrix is to be used by a second MPU in a subsequent stage of the plurality of partial matrix multiplication calculations. 18. The storage medium of claim 17 , wherein the matrix multiplication operation is associated with a backward propagation operation in a neural network. 19. A method of performing matrix multiplication on a matrix processor, comprising: receiving, from a host processor, a request to perform a matrix multiplication operation on a plurality of input matrices; partitioning the plurality of input matrices into a plurality of input partitions based on a number of available matrix processin

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • using electronic means · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10922380B2 cover?
In one embodiment, a matrix operation associated with a plurality of input matrices may be performed. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wher…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F17/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 16 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).