Converting quasi-affine expressions to matrix operations
US-12175222-B1 · Dec 24, 2024 · US
US2023021204A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2023021204-A1 |
| Application number | US-202217851306-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jun 28, 2022 |
| Priority date | Jun 29, 2021 |
| Publication date | Jan 19, 2023 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method and data processing system for implementing a neural network containing at least one matrix multiplication operation. The matrix multiplication operation is mapped to a graph of neural network operations including at least one element-wise operation. The at least one element-wise operation is implemented in fixed-function hardware of a neural network accelerator.
Opening claim text (preview).
What is claimed is: 1 . A method of implementing, using a neural network accelerator comprising fixed-function hardware, a neural network comprising a plurality of layers, wherein at least one of the layers comprises a matrix multiplication operation defined in two or more dimensions between a first tensor X having dimensions [ . . . , Q, . . . ] and a second tensor Y having dimensions [ . . . , R, . . . ], the method comprising: mapping the matrix multiplication operation to a graph of neural network operations including at least one element-wise operation; and evaluating the graph of neural network operations to thereby evaluate the matrix multiplication operation; wherein the at least one element-wise operation is evaluated in the fixed-function hardware. 2 . The method of claim 1 , wherein the graph of neural network operations further comprises at least one transformation, applied to the first tensor X and/or the second tensor Y. 3 . The method of claim 2 , wherein the at least one transformation comprises: reconfiguring the second tensor Y to form a third tensor having dimensions [ . . . , R, Q]; and splitting the third tensor into R constituent tensors each of dimensions [ . . . , 1, Q], wherein the at least one element-wise operation comprises an element-wise multiplication between the first tensor and each of the R constituent tensors. 4 . The method of claim 3 , wherein: the at least one transformation further comprises concatenating and reconfiguring the results of the element-wise multiplications to arrange them in R groups of size Q, the graph of neural network operations further comprises summing within each of the R groups over one dimension; and the at least one transformation further comprises reconfiguring the result of the summing into a tensor having dimensions [ . . . , P, R]. 5 . The method of claim 4 , wherein: the at least one transformation comprises concatenating and reconfiguring the results of the element-wise multiplications to arrange them in R groups of size Q over the channel dimension; and the summing comprises summing within each of the R groups over the channel dimension. 6 . The method of claim 4 , wherein the summing comprises at least one convolution with a tensor of ones. 7 . The method of claim 4 , wherein the summing comprises a grouped convolution with a tensor of ones such that the grouped convolution has R groups, each with Q input channels and 1 output channel. 8 . The method of claim 1 , wherein the first tensor X has dimensions [M, N, P, 1] and the second tensor Y has dimensions [M′, N′, 1, R]. 9 . The method of claim 1 , wherein: the first tensor X has dimensions [M, N, P, 1] and the second tensor Y has dimensions [M′, N′, 1, R]; and the element-wise operation comprises an element-wise multiplication of the first tensor, or a tensor derived from it, with the second tensor, or a tensor derived from it. 10 . The method of claim 9 , wherein the element-wise multiplication is performed using broadcasting over two dimensions. 11 . The method of claim 9 , wherein the element-wise multiplication is performed using broadcasting over one dimension and repeating one of the tensors over the other dimension. 12 . The method of claim 9 , wherein the element-wise multiplication comprises repeating one of the tensors over one dimension and repeating the other of the tensors over the other dimension. 13 . The method of claim 1 , wherein the at least one transformation is performed at least in part using a memory manipulation module configured to manipulate data stored in a memory; and/or wherein the repeating of a tensor is performed at least in part by one of: the memory manipulation module; and an element-wise operations unit of the neural network accelerator. 14 . The method of claim 1 , further comprising, before mapping the matrix multiplication operation to the graph of neural network operations: analysing the matrix multiplication operation; and determining, based on a result of the analysing, how to implement the matrix multiplication operation, comprising determining that the matrix multiplication operation should be implemented using the at least one element-wise operation, and rejecting at least one alternative method for implementing the matrix multiplication operation. 15 . The method of claim 14 , wherein the determining how to implement the matrix multiplication operation is based on one or more of: a size of the first tensor in one or more dimensions; a size of the second tensor in one or more dimensions; a memory-access bandwidth required to implement the matrix multiplication operation using the selected method; a memory size required to implement the matrix multiplication operation using the selected method; a number of hardware passes through the fixed-function hardware that will be required to implement the matrix multiplication operation using the selected method; an execution time on the fixed function hardware that will be required to implement the matrix multiplication operation using the selected method; a power consumption required to implement the matrix multiplication operation using the selected method; and a capability of the fixed-function hardware. 16 . A data processing system for implementing a neural network comprising a plurality of layers, wherein at least one of the layers comprises a matrix multiplication operation defined in two or more dimensions between a first tensor X having dimensions [ . . . , Q, . . . ] and a second tensor Y having dimensions [ . . . , R, . . . ], the data processing system comprising: a mapping unit, configured to map the matrix multiplication operation to a graph of neural network operations including at least one element-wise operation; and a neural network accelerator comprising fixed-function hardware; wherein the neural network accelerator is configured to evaluate the graph of neural network operations to thereby evaluate the matrix multiplication operation; and wherein the at least one element-wise operation is evaluated in the fixed-function hardware. 17 . The data processing system of claim 16 , wherein the graph of neural network operations further comprises at least one transformation, applied to the first tensor X and/or the second tensor Y; wherein the data processing system comprises a memory manipulation module for manipulating data stored in a memory; and wherein the data processing system is configured to perform the at least one transformation using memory manipulation module. 18 . The data processing system of claim 17 , wherein the memory manipulation module comprises: an internal buffer; a memory reading block, configured to read data from the memory and write the data to the internal buffer; a memory writing block, configured to read the data from the internal buffer and write the data to the memory; and a control channel between the memory reading block and the memory writing block, wherein the memory reading block and the memory writing block are configured to communicate via the control channel to maintain synchronisation between them when writing the data to the internal buffer and reading the data from the internal buffer, respectively. 19 . The method of claim 1 , wherein the layer comprising the matrix multiplication operation is a classification layer for classifying an input to the neural network into one of a number of categories. 20 . A non-transitory computer readable storage medium having stored
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
Combinations of networks · CPC title
using electronic means · CPC title
Architecture, e.g. interconnection topology · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.