System and method for an optimized winograd convolution accelerator
US-2019042923-A1 · Feb 7, 2019 · US
US12547670B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12547670-B2 |
| Application number | US-202017773446-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 3, 2020 |
| Priority date | Nov 1, 2019 |
| Publication date | Feb 10, 2026 |
| Grant date | Feb 10, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure includes an operation apparatus configured to perform a winograd convolution operation. A control circuit of the operation apparatus is configured to send a control instruction to instruct a compute circuit to perform the winograd convolution operation. The computer circuit is configured to extract data from the storage circuit for the winograd convolution operation in response to the control instruction and disassembles a transformation operation into multiple summation operations.
Opening claim text (preview).
What is claimed is: 1 . An operation apparatus configured to perform a winograd convolution operation in a neural network, the operation apparatus comprising: a control circuit; a storage circuit configured to store data related to an image; and a compute circuit including a first compute circuit and a second compute circuit, wherein the control circuit is configured to extract a plurality of control instructions from the storage circuit, the plurality of control instructions includes a first instruction and a second instruction, the first instruction includes a forward transformation instruction which is executable by the first compute circuit, the second instruction, executable by the second compute circuit, includes an element-wise multiplication instruction and an inverse transformation instruction, each of the first instruction and the second instruction includes register addresses, the data includes feature data and weight data, each of the first instruction and the second instruction is to perform the winograd convolution operation associated with each convolution layer of the neural network, the storage circuit is configured to store the data for the winograd convolution operation, the first compute circuit is configured to perform for each convolution layer of the neural network: execute the forward transformation instruction on the weight data, execute the forward transformation instruction on the feature data, obtain a result of weight transformation based on the execution of the forward transformation instruction on the weight data, and store the result of the weight transformation in the storage circuit, the execution of the forward transformation instruction on the weight data is before the execution of the forward transformation instruction on the feature data, the second compute circuit is configured to perform for each convolution layer of the neural network: obtain the result of weight transformation, in response to the second instruction, from the storage circuit, perform an element-wise multiplication on the result of weight transformation and the result of feature transformation to obtain a result of multiplication operation, perform an inverse transformation of the result of multiplication operation, disassemble the inverse transformation of the result of multiplication operation into a summation operation, and complete the inverse transformation of the result of multiplication operation according to the summation operation to obtain a result of the winograd convolution operation, wherein the compute circuit is further configured to: parse the data to obtain a plurality of sub-tensors; perform a transformation operation on the plurality of sub-tensors and sum results of the transformation operation; and obtain a winograd transformation result of the data according to a result of the summation, wherein the winograd transformation result of the data is a sum of the plurality of sub-tensors, and number of the plurality of sub-tensors is same as number of non-zero elements in the data, and wherein each sub-tensor of the plurality of sub-tensors has a non-zero element, and the non-zero element in each sub-tensor of the plurality of sub-tensors is the same as a non-zero element in a corresponding position in the data. 2 . The operation apparatus of claim 1 , wherein the compute circuit is further configured to: obtain a winograd transformation result of a meta-tensor corresponding to each sub-tensor, wherein the meta-tensor is a tensor with the non-zero element of the sub-tensor set to 1; multiply a value of the non-zero element of the sub-tensor as a coefficient by the winograd transformation result of the corresponding meta-tensor to obtain the winograd transformation result of the sub-tensors; and sum winograd transformation results of the plurality of sub-tensors obtain the winograd transformation result of the data. 3 . The operation apparatus of claim 2 , wherein the compute circuit is further configured to: multiply, for each sub-tensor, a left side of the meta-tensor corresponding to the sub-tensor by a left multiplication matrix; and multiply a right side of the meta-tensor corresponding to the sub-tensor by a right multiplication matrix to obtain the winograd transformation result of the meta-tensor, wherein the left multiplication matrix and the right multiplication matrix are both determined by a size of the sub-tensor and a winograd transformation type, and wherein the winograd transformation type includes a winograd transformation type of the forward transformation instruction and a winograd transformation type of the inverse transformation instruction. 4 . The operation apparatus of claim 1 , wherein the second compute circuit includes: a multiplication circuit configured to: obtain, in response to the second instruction, the result of weight transformation, perform the element-wise multiplication on the result of weight transformation and the result of feature transformation to obtain the result of the multiplication operation; and an inverse transformation circuit configured to perform an inverse transformation on the result of multiplication operation, wherein the inverse transformation circuit disassembles the transformation operation in the inverse transformation into the summation operations and completes the inverse transformation on the result of multiplication operation according to the summation operations to obtain the result of operation. 5 . The operation apparatus of claim 1 , wherein the compute circuit includes: an addition circuit configured to: obtain the feature data from the storage circuit in response to the first instruction, and perform a forward transformation on the feature data, wherein the addition circuit disassembles the forward transformation into the summation operations and completes the forward transformation on the feature data based on the summation operations to obtain the result of feature transformation; a multiplication circuit configured to: obtain the result of weight transformation in response to the second instruction, and perform the element-wise multiplication on the result of weight transformation and the result of feature transformation to obtain the result of multiplication operation; and the addition circuit further configured to: perform, in response to the second instruction, an inverse transformation on the result of multiplication operation, wherein the addition circuit disassembles the inverse transformation into the summation operations and completes the inverse transformation on the result of multiplication operation according to the summation operations to obtain the result of operation. 6 . An operation method applied to an operation apparatus which comprises a control circuit, a storage circuit and a compute circuit, wherein in the operation apparatus: extracting, by the control circuit, a plurality of control instructions from the storage circuit, is wherein the plurality of control instruction is configured to instruct the compute circuit to perform a winograd convolution operation associated with each convolution layer of a neural network, wherein the compute circuit includes a first compute circuit and a second compute circuit, the plurality of control instructions includes a first instruction and a second instruction, each of the first instruction and the second instruction includes register addresses, the first instruction includes a forward transformation instruction which is executable by the first compute circuit, the second instruction, executable by the second compute circuit, includes an element-wise multiplication instruction and an inverse transformation instruction, the storage circ
Multiplying only · CPC title
Adding; Subtracting (G06F7/483 - G06F7/491, G06F7/544 - G06F7/556 take precedence) · CPC title
Energy efficient computing, e.g. low power processors, power management or thermal management · CPC title
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
Prime factor Fourier transforms, e.g. Winograd transforms, number theoretic transforms · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.