Systems and methods for reducing power consumption of convolution operations of artificial neural networks

US11599181B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11599181-B1
Application numberUS-201916725331-A
CountryUS
Kind codeB1
Filing dateDec 23, 2019
Priority dateDec 23, 2019
Publication dateMar 7, 2023
Grant dateMar 7, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method may include (1) maintaining (a) a filter matrix in a filter cache included in a local memory device (LMD) included in a hardware accelerator, and (b) a plurality of activation matrices corresponding to different rows of an activation volume in an activation cache included in the LMD, (2) for each activation matrix, directing a matrix multiplication unit (MMU) included in the hardware accelerator to execute a matrix multiplication operation (MMU) using the filter matrix and the activation matrix, (3) loading an additional filter matrix into the filter cache, and (4) directing the MMU to execute a plurality of additional MMOs, each additional MMO using one filter matrix included in the filter cache and one activation matrix included in the activation cache, such that the MMU reuses the filter matrix for at least one additional MMO and uses the additional filter matrix for a different additional MMO.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: maintaining, by a maintaining module stored in memory and executed by at least one physical processor: a filter matrix in a filter cache included in a local memory device (LMD) included in a hardware accelerator; and a plurality of activation matrices corresponding to different rows of an activation volume in an activation cache included in the LMD; for each activation matrix, directing, by a directing module stored in memory and executed by the physical processor, a matrix multiplication unit (MMU) included in the hardware accelerator to execute a matrix multiplication operation (MMO) using the filter matrix and the activation matrix; loading, by a loading module stored in memory and executed by the physical processor, an additional filter matrix into the filter cache; and directing, by a directing module stored in memory and executed by the physical processor, the MMU to execute a plurality of additional MMOs, each additional MMO using one filter matrix included in the filter cache and an activation matrix included in the activation cache, such that the MMU reuses the filter matrix for at least one additional MMO and uses the additional filter matrix for a different additional MMO. 2. The computer-implemented method of claim 1 , wherein: each filter matrix comprises a set of filter vectors corresponding to a filter location included in each of a set of filters of a convolutional layer of an artificial neural network; and each activation matrix comprises a set of activation vectors, each activation vector comprising a set of channel values corresponding to a location within the activation volume. 3. The computer-implemented method of claim 2 , wherein the filter matrix corresponds to a primary filter location and the additional filter matrix corresponds to a secondary filter location. 4. The computer-implemented method of claim 2 , wherein executing each additional MMO in the plurality of additional MMOs comprises: selecting, from a plurality of activation vectors included in the plurality of activation matrices loaded into the activation cache, a selected set of activation vectors associated with a row of the activation volume; and directing the MMU to use the selected set of activation vectors associated with the row of the activation volume as a multiplicand matrix in the additional MMO. 5. The computer-implemented method of claim 2 , further comprising replacing, by the loading module prior to directing the MMU to execute at least one additional MMO in the plurality of additional MMOs, at least one of: at least one activation vector loaded into the activation cache with an additional activation vector; or at least one filter matrix loaded into the filter cache with a supplemental filter matrix. 6. The computer-implemented method of claim 5 , wherein replacing the at least one activation vector comprises replacing the at least one activation vector in accordance with at least one of: a first-in, first-out (FIFO) replacement policy; or a least-recently-used (LRU) replacement policy. 7. The computer-implemented method of claim 1 , wherein: the hardware accelerator further comprises a set of output activation registers associated with the MMU; and directing the MMU to execute the MMO using the filter matrix and the activation matrix comprises, for each activation matrix in the activation cache: generating a primary result matrix corresponding to the activation matrix by directing the MMU to execute the MMO using the filter matrix as a multiplier matrix and the activation matrix as a multiplicand matrix; and storing the primary result matrix within the set of output activation registers. 8. The computer-implemented method of claim 7 , wherein directing the MMU to execute the plurality of additional MMOs comprises, for each additional MMO in the plurality of additional MMOs: designating a set of activation vectors loaded into the activation cache and associated with a row of the activation volume as an intermediate activation matrix; producing a secondary result matrix by directing the MMU to execute an additional MMO using the intermediate activation matrix as a multiplicand matrix and a selected filter matrix loaded into the filter cache as a multiplier matrix; and accumulating the secondary result matrix with at least one primary result matrix included in the set of output activation registers. 9. The computer-implemented method of claim 8 , further comprising determining, based on a result of accumulating the secondary result matrix and the at least one primary result matrix, a set of output activation values for a convolutional layer of an artificial neural network. 10. The computer-implemented method of claim 1 , wherein the LMD comprises: a set of multiplier registers associated with the MMU; and a set of multiplicand registers associated with the MMU. 11. The computer-implemented method of claim 10 , wherein directing the MMU to execute the MMO using the filter matrix and the activation matrix comprises: loading: the filter matrix from the filter cache into the set of multiplier registers; and the activation matrix from the activation cache into the set of multiplicand registers; and directing the MMU to execute the MMO using the filter matrix as a multiplier matrix and the activation matrix as a multiplicand matrix. 12. The computer-implemented method of claim 10 , wherein directing the MMU to execute the plurality of additional MMOs comprises, for each additional MMO included in the plurality of additional MMOs: selecting: at least one filter matrix loaded into the filter cache as a selected filter matrix; a set of activation vectors from a plurality of activation vectors loaded into the activation cache as a selected activation matrix; and loading: the selected filter matrix from the filter cache into the set of multiplier registers; and the selected activation matrix from the activation cache into the set of multiplicand registers; and directing the MMU to execute an additional MMO using the selected filter matrix as a multiplier matrix and the selected activation matrix as a multiplicand matrix. 13. The computer-implemented method of claim 1 , wherein: the activation volume comprises a digital image comprising: at least one row of activation values; at least one column of activation values; and at least one channel of activation values; and each activation matrix included in the plurality of activation matrices corresponds to at least a portion of a different row of activation values included in the digital image. 14. The computer-implemented method of claim 1 , wherein directing the MMU to execute an MMO comprises directing the MMU to execute a generalized matrix multiplication (GEMM) operation. 15. A system comprising: a hardware accelerator comprising: a matrix multiplication unit (MMU); and a local memory device (LMD); a maintaining module, stored in memory, that maintains: a filter matrix in a filter cache included in the LMD; and a plurality of activation matrices corresponding to different rows of an activation volume in an activation cache included in the LMD; a directing module, stored in memory, that, for each activation matrix, directs the MMU to execute a matrix multiplication operation (MMO) using the filter matrix and the activation matrix; a loading module, stored in memory, that loads an additional filter matrix corresponding to a secondary filter location into the filter cache; an executing module, stored in memory, that directs the MMU to exe

Assignees

Inventors

Classifications

  • Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • G06F1/3275Primary

    Power saving in memory, e.g. RAM, cache · CPC title

  • G06F1/3225Primary

    of memory devices · CPC title

  • with dedicated cache, e.g. instruction or stack · CPC title

  • Architecture, e.g. interconnection topology · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11599181B1 cover?
A computer-implemented method may include (1) maintaining (a) a filter matrix in a filter cache included in a local memory device (LMD) included in a hardware accelerator, and (b) a plurality of activation matrices corresponding to different rows of an activation volume in an activation cache included in the LMD, (2) for each activation matrix, directing a matrix multiplication unit (MMU) inclu…
Who is the assignee on this patent?
Meta Platforms Inc
What technology area does this patent fall under?
Primary CPC classification G06F1/3275. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).