Optimized matrix multiplication using vector multiplication of interleaved matrix values
US-9645974-B1 · May 9, 2017 · US
US2017193368A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2017193368-A1 |
| Application number | US-201514984510-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 30, 2015 |
| Priority date | Dec 30, 2015 |
| Publication date | Jul 6, 2017 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure is directed to parallelization of artificial neural network processing by conditionally synchronizing, among multiple computer processors, either the input or output of individual operations, and by conditionally using either rows or columns of certain matrices used in the operations. The conditional processing may depend upon the relative sizes of the input and output of the specific operations to be performed. For example, if a current layer matrix of values is larger than a next layer matrix of values to be computed, then rows of a weight matrix may be used by the computer processors to compute the next layer matrix. If the current layer matrix is smaller than the next layer matrix, then columns of the weight matrix may be used by the computer processors to compute the next layer matrix.
Opening claim text (preview).
What is claimed is: 1 . A system comprising a plurality of processors, the system programmed by executable instructions to at least: obtain data defining an artificial neural network, the artificial neural network comprising a first layer of nodes, a second layer of nodes, and a third layer of nodes, wherein the first layer comprises more nodes than the second layer, and wherein the third layer comprises more nodes than the second layer; provide to a first processor of the plurality of processors: a first column of input data from a first data matrix, the first data matrix comprising input data for the artificial neural network; a first row of weights from a first weight matrix, the first weight matrix comprising weights for connections between nodes of the first layer and nodes of the second layer; and a first column of weights from a second weight matrix, the second weight matrix comprising weights for connections between nodes of the second layer and nodes of the third layer; provide to a second processor of the plurality of processors: a second column of input data from the first data matrix; a second row of weights from the first weight matrix; and a second column of weights from the second weight matrix; compute, using the first processor, a first subset of columns of a second data matrix of values for the second layer, wherein the first subset is computed from the first column of input data, the first row of weights, and aggregated values received from the second processor of the plurality of processors; compute, using the second processor, a second subset of columns of the second data matrix, wherein the second subset is computed from the second column of input data, the second row of weights, and aggregated values received from the first processor of the plurality of processors; store, on each of the first and second processors, the second data matrix; compute, using the first processor, a third subset of columns of a third data matrix of values for the third layer, wherein the third subset is computed from the second data matrix and the first row of weights; compute, using the second processor, a fourth subset of columns of the third data matrix, wherein the fourth subset is computed from the second data matrix and the second row of weights; and generate artificial neural network output using the third data matrix. 2 . The system of claim 1 , wherein the plurality of processors are configured as a peer-to-peer ring, wherein the first processor transmits data to the second processor via a third processor of the plurality of processors, wherein the second processor transmits data to the third processor via the first processor, and wherein the third processor transmits data to the first processor via the second processor. 3 . The system of claim 2 , wherein the executable instructions to store the second data matrix on each of the first and second processors comprise executable instructions to at least: provide, by the first processor to the third processor, data computed from the first column of input data and the second row of weights; and provide, by the third processor to the second processor, data computed by the third processor aggregated with data received from the first processor. 4 . The system of claim 1 , wherein the first data matrix comprises a plurality of rows, and wherein individual rows of the plurality of rows comprise training data input vectors. 5 . A computer-implemented method comprising: under control of a computer system comprising a plurality of computer processors, the computer system configured to execute specific computer-executable instructions, determining that a size of a first layer matrix of values for a first layer of an artificial neural network is greater than a size of a second layer matrix of values to be computed for a second layer of the artificial neural network; computing, by the plurality of computer processors, the second layer matrix, wherein individual computer processors of the plurality of computer processors each compute a different contribution to the second layer matrix using a corresponding subset of columns of the first layer matrix and a corresponding subset of rows of a first weight matrix, and wherein the first weight matrix comprises weights for connections between nodes of the first layer and nodes of the second layer; determining that a size of a third layer matrix of values for a third layer of the artificial neural network is less than a size of a fourth layer matrix of values to be computed for a fourth layer of the artificial neural network; and computing, by the plurality of computer processors, the fourth layer matrix, wherein individual computer processors of the plurality of computer processors each compute a different subset of columns of the fourth layer matrix using the third layer matrix and a corresponding subset of columns of a second weight matrix, and wherein the second weight matrix comprises weights for connections between nodes of the third layer and nodes of the fourth layer. 6 . The computer-implemented method of claim 5 , wherein computing the second layer matrix comprises: providing, by a first computer processor of the plurality of computer processors to a second computer processor of the plurality of computer processors, a first contribution to the second layer matrix, wherein the first contribution is based on at least one column of values computed by the first computer processor; and adding, by the second computer processor, the first contribution and a second contribution to generate at least one column of values for the second layer matrix, wherein the second contribution is based on at least one column of values computed by the second computer processor. 7 . The computer-implemented method of claim 5 , further comprising providing, by a first computer processor of the plurality of computer processors to a second computer processor of the plurality of computer processors, a first column of the third layer matrix, wherein the first column is computed by the first computer processor, and wherein the second computer processor previously computed a second column of the third layer matrix different than the first column. 8 . The computer implemented method of claim 5 , wherein the third layer matrix comprises the second layer matrix. 9 . The computer-implemented method of claim 5 , wherein the first layer matrix comprises the fourth layer matrix. 10 . The computer-implemented method of claim 5 , wherein a first computer processor of the plurality of computer processors transmits data to a second computer processor of the plurality of computer processors via a third computer processor of the plurality of computer processors, wherein the second computer processor transmits data to the third computer processor via the first computer processor, and wherein the third computer processor transmits data to the first computer processor via the second computer processor. 11 . The computer-implemented method of claim 10 , wherein the first computer processor and second computer processor communicate with other via a first switch, and wherein third computer processor and a fourth computer processor of the plurality of computer processors communicate with each other via a second switch separate from the first switch. 12 . The computer-implemented method of claim 5 , wherein the first layer matrix comprises a plurality of rows of training data input to the artificial neural network. 13 . The computer-implemented method of claim 5 , further comprising performing, using the artificial neural network, at least one of product recommendati
Related publications grouped by family.
Answers are generated from the same data shown on this page.