Systems and methods for reducing power consumption of convolution operations for artificial neural networks
US-11120328-B1 · Sep 14, 2021 · US
US11501147B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-11501147-B1 |
| Application number | US-202016777606-A |
| Country | US |
| Kind code | B1 |
| Filing date | Jan 30, 2020 |
| Priority date | Jan 30, 2020 |
| Publication date | Nov 15, 2022 |
| Grant date | Nov 15, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A disclosed computer-implemented method may include maintaining, within a local memory device (LMD) included in a hardware accelerator (1) a filter matrix corresponding to a filter location included in each of a set of filters of a convolutional layer of an artificial neural network (ANN), and (2) a set of activation vectors corresponding to an active region of an activation volume input into the convolutional layer. The method may also include determining that the active region of the activation volume is contiguous with a padding region associated with at least a portion of the activation volume. The method may further include directing a matrix multiplication unit (MMU) included in the hardware accelerator to execute a matrix multiplication operation (MMO) using the filter matrix and an activation matrix that may include (1) the set of activation vectors, and (2) at least one padding vector corresponding to the padding region.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: maintaining, within a local memory device (LMD) included in a hardware accelerator: a filter matrix corresponding to a filter location included in each of a set of filters of a convolutional layer of an artificial neural network (ANN); and a set of activation vectors corresponding to an active region of an activation volume input into the convolutional layer; determining that the active region of the activation volume is contiguous with a padding region associated with at least a portion of the activation volume; and directing a matrix multiplication unit (MMU) included in the hardware accelerator to execute a matrix multiplication operation (MMO) using the filter matrix and an activation matrix comprising: the set of activation vectors; and at least one padding vector corresponding to the padding region. 2. The computer-implemented method of claim 1 , wherein: the LMD comprises: a set of multiplier registers associated with the MMU; and a set of multiplicand registers associated with the MMU; maintaining the filter matrix within the LMD comprises loading, from a data store, the filter matrix to the set of multiplier registers; and maintaining the set of activation vectors within the LMD comprises loading, from the data store, the set of activation vectors to the set of multiplicand registers. 3. The computer-implemented method of claim 2 , wherein directing the MMU to execute the MMO using the filter matrix and the activation matrix comprises directing the hardware accelerator to include a padding value in a multiplicand register included in the set of multiplicand registers corresponding to the padding region. 4. The computer-implemented method of claim 2 , wherein: the hardware accelerator further comprises a set of output activation registers associated with the MMU; and directing the MMU to execute the MMO using the filter matrix and the activation matrix comprises: for each multiplicand register that includes an activation vector included in the active region of the activation volume: directing the MMU to execute a dot product operation using a filter vector included in the filter matrix and the activation vector; and storing a result of the dot product operation in the set of output activation registers; and for each multiplicand register that corresponds to the padding region, storing a padding value in the set of output activation registers. 5. The computer-implemented method of claim 1 , wherein directing the MMU to execute the MMO using the filter matrix and the activation matrix comprises directing the MMU to execute the MMO using the filter matrix as a multiplier matrix and the activation matrix as a multiplicand matrix. 6. The computer-implemented method of claim 5 , wherein: the filter matrix comprises a set of filter vectors corresponding to a filter location included in each of a set of filters of the convolutional layer of the artificial neural network; and each activation vector in the set of activation vectors comprises a set of channel values corresponding to a location within the activation volume; and the active region comprises at least a portion of a row of activation vectors included in the activation volume. 7. The computer-implemented method of claim 6 , wherein: the multiplier matrix comprises: a multiplier matrix height dimension; and a multiplier matrix width dimension; and the multiplicand matrix comprises: a multiplicand matrix height dimension comprising the multiplier matrix width dimension; and a multiplicand matrix width dimension. 8. The computer-implemented method of claim 7 , wherein: the activation matrix comprises a number of activation vectors no greater than the multiplier matrix height dimension; and each filter vector included in the set of filter vectors comprises a predetermined number of filter weight values, wherein: the predetermined number of filter weight values is at most the multiplier matrix width dimension; and each filter weight value included in the filter vector corresponds to a different channel included in a set of channels associated with each of the set of filters. 9. The computer-implemented method of claim 1 , further comprising: replacing: the filter matrix with an additional filter matrix corresponding to an additional filter location; and at least one activation vector included in the set of activation vectors with an additional activation vector included in the activation volume; and directing the MMU to execute an additional MMO using the additional filter matrix and the activation matrix. 10. The computer-implemented method of claim 9 , wherein: the hardware accelerator further comprises a set of output activation registers associated with the MMU; and directing the MMU to execute the MMO using the filter matrix and the activation matrix further comprises: generating a primary result matrix by directing the MMU to execute the MMO using the filter matrix as a multiplier matrix and the activation matrix as a multiplicand matrix; and storing the primary result matrix within the set of output activation registers. 11. The computer-implemented method of claim 10 , wherein directing the MMU to execute the additional MMO using the additional filter matrix and the activation matrix further comprises: producing a secondary result matrix by directing the MMU to execute the additional MMO using the additional filter matrix as the multiplier matrix and the activation matrix as the multiplicand matrix; accumulating the secondary result matrix and the primary result matrix; and storing a result of accumulating the secondary result matrix and the primary result matrix within the set of output activation registers. 12. The computer-implemented method of claim 11 , wherein the computer-implemented method further comprises determining, based on the result of accumulating the secondary result matrix and the primary result matrix, a set of output activation values for the convolutional layer of the ANN. 13. The computer-implemented method of claim 1 , wherein directing the MMU to execute an MMO comprises directing the MMU to execute a generalized matrix multiplication (GEMM) operation. 14. The computer-implemented method of claim 1 , wherein the activation volume comprises a digital image comprising: at least one row of activation values; at least one column of activation values; and at least one channel of activation values. 15. A system comprising: a hardware accelerator comprising: a matrix multiplication unit (MMU); and a local memory device (LMD); a maintaining module, stored in memory, that maintains, within the LMD: a filter matrix corresponding to a filter location included in each of a set of filters of a convolutional layer of an artificial neural network (ANN); and a set of activation vectors corresponding to an active region of an activation volume input into the convolutional layer; a determining module, stored in memory, that determines that the active region of the activation volume is contiguous with a padding region associated with at least a portion of the activation volume; and a directing module, stored in memory, that directs the MMU to execute a matrix multiplication operation (MMO) using the filter matrix and an activation matrix comprising: the set of activation vectors; and at least one padding vector corresponding to the padding region; and at least one physical processor that executes the maintaining module, the determining module, and the directing module.
using electronic means · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Distributed learning, e.g. federated learning · CPC title
Architecture, e.g. interconnection topology · CPC title
Learning methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.