Transposed convolution using systolic array
US-2021097375-A1 · Apr 1, 2021 · US
US12488253B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12488253-B2 |
| Application number | US-202217568325-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 4, 2022 |
| Priority date | Jan 4, 2021 |
| Publication date | Dec 2, 2025 |
| Grant date | Dec 2, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method and data processing system implement a neural network containing at least one matrix multiplication operation. The matrix multiplication operation is mapped to a graph of neural network operations including at least one transformation and at least one convolution. The at least one convolution is implemented in fixed-function hardware of a neural network accelerator.
Opening claim text (preview).
What is claimed is: 1 . A method of implementing, using a neural network accelerator comprising fixed-function hardware, a neural network comprising a plurality of layers, wherein at least one of the layers comprises a matrix multiplication operation defined in two or more dimensions between a first tensor X having dimensions [ . . . , P, . . . , Q, . . . ] and a second tensor Y having dimensions [ . . . , Q, . . . , R, . . . ], the method comprising: mapping the matrix multiplication operation to a graph of neural network operations including at least one transformation and at least one convolution operation; evaluating the graph of neural network operations to thereby evaluate the matrix multiplication operation, wherein the at least one convolution operation is evaluated in the fixed-function hardware; and implementing, in said fixed-function hardware, said neural network to include a matrix multiplication layer configured to perform a matrix multiplication operation in dependence on the evaluation of the graph, whereby said neural network accelerator is adapted to multiply the same set of weights simultaneously by multiple sets of input data elements in parallel at multiple processing elements. 2 . The method of claim 1 , wherein: the first tensor X or a tensor derived from it is treated as input data for the at least one convolution operation, and the second tensor Y or a tensor derived from it is treated as coefficient data for the at least one convolution operation. 3 . The method of claim 1 , wherein the at least one transformation reconfigures the second tensor Y to arrange the dimension with size R in the output channel dimension before the at least one convolution operation is evaluated. 4 . The method of claim 1 , wherein the at least one transformation reconfigures both tensors to arrange the dimension with size Q in the input channel dimension before the at least one convolution operation is evaluated. 5 . The method of claim 1 , wherein the at least one transformation reconfigures the first tensor X to arrange the dimension with size P in a dimension that is traversed by the at least one convolution operation. 6 . The method of claim 1 , wherein the hardware is configured to evaluate the at least one convolution operation by processing in parallel several sets of one or more input data elements selected along a first dimension traversed by the convolution operation, and wherein the at least one transformation reconfigures the first tensor X to arrange the dimension with size P in the first dimension. 7 . The method of claim 1 , wherein: the first tensor X has dimensions [1, 1, P, Q] and the second tensor Y has dimensions [1, 1, Q, R]; the at least one transformation reconfigures the first tensor X to form a reconfigured first tensor having dimensions [1, Q, 1, P]; the at least one transformation reconfigures the second tensor Y to form a reconfigured second tensor having dimensions [R, Q, 1, 1]; and the reconfigured first tensor and reconfigured second tensor are input to the at least one convolution. 8 . The method of claim 1 , wherein: the first tensor X has dimensions [M, N, P, Q] and the second tensor Y has dimensions [M′, N′, Q, R], where B=(max (M, M′) max (N,N′))>1; the at least one transformation splits and/or replicates, and reconfigures, the first tensor X to form B reconfigured first tensors each having dimensions [1, Q, 1, P], wherein if M′>M=1 or N′>N=1 the at least one transformation comprises replicating the first tensor in the respective dimension, and if M′=M>1 or N′=N>1 the at least one transformation comprises splitting the first tensor in the respective dimension; the at least one transformation splits and/or replicates, and reconfigures, the second tensor Y to form B reconfigured second tensors having dimensions [R, Q, 1, 1], wherein if M>M′=1 or N>N′=1 the at least one transformation comprises replicating the second tensor in the respective dimension, and if M′=M>1 or N′=N>1 the at least one transformation comprises splitting the second tensor in the respective dimension; and the at least one convolution comprises B convolutions applied to respective pairs of the first reconfigured tensors and second reconfigured tensors. 9 . The method of claim 8 , wherein if either (i) M′=1 and M>1, or (ii) N′=1 and N>1, broadcasting is performed such that the second tensor Y is reused across several convolutions. 10 . The method of claim 1 , wherein: the first tensor X has dimensions [M, N, P, Q] and the second tensor Y has dimensions [M′, N′, Q, R]; the at least one transformation reconfigures the first tensor X to form a reconfigured first tensor having dimensions [1, BQ, 1, P]; the at least one transformation reconfigures the second tensor Y to form a reconfigured second tensor having dimensions [BR, Q, 1, 1]; and the at least one convolution comprises a grouped convolution, with B groups each with Q input channels and R output channels, applied to the reconfigured first tensor and reconfigured second tensor, wherein B=(max (M, M′) max (N,N′)), and wherein: if M′>M=1 and/or N′>N=1 the reconfiguration of the first tensor comprises replicating the first tensor M′ times and/or N′ times in the respective dimensions; and if M>M′=1 and/or N>N′=1 the reconfiguration of the second tensor comprises replicating the second tensor M times and/or N times in the respective dimensions. 11 . The method of claim 1 , wherein the first tensor X has dimensions [M, N, P, 1] and the second tensor Y has dimensions [M′, N′, 1, R]. 12 . The method of claim 1 , further comprising, before mapping the matrix multiplication operation to the graph of neural network operations, analysing the matrix multiplication operation, and determining, based on a result of the analysing, how to implement the matrix multiplication operation, comprising determining that the matrix multiplication operation should be implemented using the at least one transformation and the at least one convolution operation, and rejecting at least one alternative method for implementing the matrix multiplication operation. 13 . The method of claim 12 , wherein the determining how to implement the matrix multiplication operation is based on one or more of: a size of the first tensor in one or more dimensions; a size of the second tensor in one or more dimensions; a memory-access bandwidth required to implement the matrix multiplication operation using the selected method; a memory size required to implement the matrix multiplication operation using the selected method; a number of hardware passes through the fixed-function hardware that will be required to implement the matrix multiplication operation using the selected method; an execution time on the fixed function hardware that will be required to implement the matrix multiplication operation using the selected method; a power consumption required to implement the matrix multiplication operation using the selected method; and a capability of the fixed-function hardware. 14 . The method of claim 1 , wherein the layer comprising the matrix multiplication operation is a classification layer, for classifying an input to the neural network into one of a number of categories. 15 . The method of claim 1 , wherein the neural network is configured for use in one of: a natural language processing application; and an image processing application, and/or wherein the neural network comprises an attention-based neural network. 16 . A non-transitory computer readable storage medium having stored thereon computer readable
Recurrent networks, e.g. Hopfield networks · CPC title
Combinations of networks · CPC title
using electronic means · CPC title
Architecture, e.g. interconnection topology · CPC title
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.