What technology area does this patent fall under?

Primary CPC classification G06F1/3275. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for reducing power consumption of convolution operations of artificial neural networks

US11599181B1 · US · B1

Patent metadata
Field	Value
Publication number	US-11599181-B1
Application number	US-201916725331-A
Country	US
Kind code	B1
Filing date	Dec 23, 2019
Priority date	Dec 23, 2019
Publication date	Mar 7, 2023
Grant date	Mar 7, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method may include (1) maintaining (a) a filter matrix in a filter cache included in a local memory device (LMD) included in a hardware accelerator, and (b) a plurality of activation matrices corresponding to different rows of an activation volume in an activation cache included in the LMD, (2) for each activation matrix, directing a matrix multiplication unit (MMU) included in the hardware accelerator to execute a matrix multiplication operation (MMU) using the filter matrix and the activation matrix, (3) loading an additional filter matrix into the filter cache, and (4) directing the MMU to execute a plurality of additional MMOs, each additional MMO using one filter matrix included in the filter cache and one activation matrix included in the activation cache, such that the MMU reuses the filter matrix for at least one additional MMO and uses the additional filter matrix for a different additional MMO.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: maintaining, by a maintaining module stored in memory and executed by at least one physical processor: a filter matrix in a filter cache included in a local memory device (LMD) included in a hardware accelerator; and a plurality of activation matrices corresponding to different rows of an activation volume in an activation cache included in the LMD; for each activation matrix, directing, by a directing module stored in memory and executed by the physical processor, a matrix multiplication unit (MMU) included in the hardware accelerator to execute a matrix multiplication operation (MMO) using the filter matrix and the activation matrix; loading, by a loading module stored in memory and executed by the physical processor, an additional filter matrix into the filter cache; and directing, by a directing module stored in memory and executed by the physical processor, the MMU to execute a plurality of additional MMOs, each additional MMO using one filter matrix included in the filter cache and an activation matrix included in the activation cache, such that the MMU reuses the filter matrix for at least one additional MMO and uses the additional filter matrix for a different additional MMO. 2. The computer-implemented method of claim 1 , wherein: each filter matrix comprises a set of filter vectors corresponding to a filter location included in each of a set of filters of a convolutional layer of an artificial neural network; and each activation matrix comprises a set of activation vectors, each activation vector comprising a set of channel values corresponding to a location within the activation volume. 3. The computer-implemented method of claim 2 , wherein the filter matrix corresponds to a primary filter location and the additional filter matrix corresponds to a secondary filter location. 4. The computer-implemented method of claim 2 , wherein executing each additional MMO in the plurality of additional MMOs comprises: selecting, from a plurality of activation vectors included in the plurality of activation matrices loaded into the activation cache, a selected set of activation vectors associated with a row of the activation volume; and directing the MMU to use the selected set of activation vectors associated with the row of the activation volume as a multiplicand matrix in the additional MMO. 5. The computer-implemented method of claim 2 , further comprising replacing, by the loading module prior to directing the MMU to execute at least one additional MMO in the plurality of additional MMOs, at least one of: at least one activation vector loaded into the activation cache with an additional activation vector; or at least one filter matrix loaded into the filter cache with a supplemental filter matrix. 6. The computer-implemented method of claim 5 , wherein replacing the at least one activation vector comprises replacing the at least one activation vector in accordance with at least one of: a first-in, first-out (FIFO) replacement policy; or a least-recently-used (LRU) replacement policy. 7. The computer-implemented method of claim 1 , wherein: the hardware accelerator further comprises a set of output activation registers associated with the MMU; and directing the MMU to execute the MMO using the filter matrix and the activation matrix comprises, for each activation matrix in the activation cache: generating a primary result matrix corresponding to the activation matrix by directing the MMU to execute the MMO using the filter matrix as a multiplier matrix and the activation matrix as a multiplicand matrix; and storing the primary result matrix within the set of output activation registers. 8. The computer-implemented method of claim 7 , wherein directing the MMU to execute the plurality of additional MMOs comprises, for each additional MMO in the plurality of additional MMOs: designating a set of activation vectors loaded into the activation cache and associated with a row of the activation volume as an intermediate activation matrix; producing a secondary result matrix by directing the MMU to execute an additional MMO using the intermediate activation matrix as a multiplicand matrix and a selected filter matrix loaded into the filter cache as a multiplier matrix; and accumulating the secondary result matrix with at least one primary result matrix included in the set of output activation registers. 9. The computer-implemented method of claim 8 , further comprising determining, based on a result of accumulating the secondary result matrix and the at least one primary result matrix, a set of output activation values for a convolutional layer of an artificial neural network. 10. The computer-implemented method of claim 1 , wherein the LMD comprises: a set of multiplier registers associated with the MMU; and a set of multiplicand registers associated with the MMU. 11. The computer-implemented method of claim 10 , wherein directing the MMU to execute the MMO using the filter matrix and the activation matrix comprises: loading: the filter matrix from the filter cache into the set of multiplier registers; and the activation matrix from the activation cache into the set of multiplicand registers; and directing the MMU to execute the MMO using the filter matrix as a multiplier matrix and the activation matrix as a multiplicand matrix. 12. The computer-implemented method of claim 10 , wherein directing the MMU to execute the plurality of additional MMOs comprises, for each additional MMO included in the plurality of additional MMOs: selecting: at least one filter matrix loaded into the filter cache as a selected filter matrix; a set of activation vectors from a plurality of activation vectors loaded into the activation cache as a selected activation matrix; and loading: the selected filter matrix from the filter cache into the set of multiplier registers; and the selected activation matrix from the activation cache into the set of multiplicand registers; and directing the MMU to execute an additional MMO using the selected filter matrix as a multiplier matrix and the selected activation matrix as a multiplicand matrix. 13. The computer-implemented method of claim 1 , wherein: the activation volume comprises a digital image comprising: at least one row of activation values; at least one column of activation values; and at least one channel of activation values; and each activation matrix included in the plurality of activation matrices corresponds to at least a portion of a different row of activation values included in the digital image. 14. The computer-implemented method of claim 1 , wherein directing the MMU to execute an MMO comprises directing the MMU to execute a generalized matrix multiplication (GEMM) operation. 15. A system comprising: a hardware accelerator comprising: a matrix multiplication unit (MMU); and a local memory device (LMD); a maintaining module, stored in memory, that maintains: a filter matrix in a filter cache included in the LMD; and a plurality of activation matrices corresponding to different rows of an activation volume in an activation cache included in the LMD; a directing module, stored in memory, that, for each activation matrix, directs the MMU to execute a matrix multiplication operation (MMO) using the filter matrix and the activation matrix; a loading module, stored in memory, that loads an additional filter matrix corresponding to a secondary filter location into the filter cache; an executing module, stored in memory, that directs the MMU to exe

Assignees

Meta Platforms Inc

Inventors

Classifications

G06F17/16
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
G06F1/3275Primary
Power saving in memory, e.g. RAM, cache · CPC title
G06F1/3225Primary
of memory devices · CPC title
G06F12/0875
with dedicated cache, e.g. instruction or stack · CPC title
G06N3/04
Architecture, e.g. interconnection topology · CPC title

Patent family

Related publications grouped by family.

View patent family 85386894

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11599181B1 cover?: A computer-implemented method may include (1) maintaining (a) a filter matrix in a filter cache included in a local memory device (LMD) included in a hardware accelerator, and (b) a plurality of activation matrices corresponding to different rows of an activation volume in an activation cache included in the LMD, (2) for each activation matrix, directing a matrix multiplication unit (MMU) inclu…
Who is the assignee on this patent?: Meta Platforms Inc
What technology area does this patent fall under?: Primary CPC classification G06F1/3275. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Systems and methods for reducing power consumption of convolution operations for artificial neural networks

Sparsity-aware hardware accelerators

Optimized matrix multiplication using vector multiplication of interleaved matrix values

Frequently asked questions