Operation Accelerator, Processing Method, and Related Device
US-2021224125-A1 · Jul 22, 2021 · US
US11334648B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11334648-B2 |
| Application number | US-202016915915-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 29, 2020 |
| Priority date | Dec 29, 2017 |
| Publication date | May 17, 2022 |
| Grant date | May 17, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of the present invention disclose a matrix multiplier, and relate to the field of data computing technologies, so as to divide two matrices into blocks for computation. The matrix multiplier includes: a first memory, a second memory, an operation circuit, and a controller, where the operation circuit, the first memory, and the second memory may perform data communication by using a bus; and the controller is configured to control, according to a preset program or instruction, a first matrix and a second matrix to be divided into blocks, and control the operation circuit to perform a multiplication operation on corresponding blocks in the first memory and the second memory based on block division results of the controller. The matrix multiplier may be configured to perform a multiplication operation on two matrices.
Opening claim text (preview).
What is claimed is: 1. A matrix multiplier, comprising: a first memory, configured to store a first matrix, wherein the first matrix is an M*K matrix; a second memory, configured to store a second matrix, wherein the second matrix is a K*N matrix; an operation circuit connected to the first memory and the second memory, wherein the operation circuit comprises operation units of X rows and Y columns, and each operation unit comprises a vector multiplication circuit and an addition circuit, wherein the vector multiplication circuit is configured to receive row vector data sent by the first memory and column vector data sent by the second memory, and multiply the two vectors; and the addition circuit is configured to add results obtained by multiplying the two vectors, and accumulate computation results of a same operation unit, to obtain an operation result of each operation unit; and a controller connected to the operation circuit, wherein the controller is configured to perform the following actions: dividing the first matrix into blocks in a unit of a sub-block whose size is X*L, to obtain S×R sub-blocks of a same size, wherein a sub-block in a row s and a column r of the S×R sub-blocks is denoted as A sr , s=(1, 2, 3, . . . , and S), and r=(1, 2, 3, . . . , and R); and dividing the second matrix into blocks in a unit of a sub-block whose size is L*Y, to obtain R×T sub-blocks of a same size, wherein a sub-block in a row r and a column t in the R×T sub-blocks is denoted as B rt , r=(1, 2, 3, . . . , and R), and t=(1, 2, 3, . . . , and T); wherein the controller is further configured to perform the following action: inputting a row x in X row vectors of any sub-block A sr and a column y in Y column vectors of a corresponding sub-block B rt into an operation unit in a row x and a column y in the operation units of X rows and Y columns, so as to perform an operation, wherein x=(1, 2, 3, . . . , and X), y=(1, 2, 3, . . . , and Y), and r in the any sub-block A sr and r in the corresponding sub-block B rt have an equal value. 2. The matrix multiplier according to claim 1 , wherein the controller is specifically configured to perform the following action: inputting the row x in the X row vectors of the any sub-block A sr and the column y in the Y column vectors of the corresponding sub-block B rt into the operation unit in a row x and a column y in the operation units of X rows and Y columns in parallel in a same clock cycle, so as to perform the operation. 3. The matrix multiplier according to claim 1 , wherein the controller is further configured to control row vectors of the any sub-block A sr to successively enter, in ascending order of x row numbers, a row x corresponding to the operation units of X rows and Y columns, wherein a difference between moments at which adjacent row vectors enter operation units in a same column and different rows is one clock cycle; and the controller is further configured to simultaneously control column vectors of the corresponding sub-block B rt to successively enter, in ascending order of y column numbers, a column y corresponding to the operation units of X rows and Y columns, wherein a difference between moments at which adjacent column vectors enter operation units in a same row and different columns is one clock cycle. 4. The matrix multiplier according to claim 1 , wherein the controller is further configured to control: values of s and r to remain unchanged and a value of t to be changed in at least two consecutive sub-block multiplication computation cycles, so that the first memory reuses a same sub-block A sr within the at least two consecutive sub-block multiplication computation cycles, wherein the sub-block multiplication computation cycle is a time used by the operation units of X rows and Y columns to complete a matrix multiplication operation on one sub-block A sr and a corresponding sub-block B rt . 5. The matrix multiplier according to claim 1 , wherein the matrix multiplier further comprises a third memory connected to the operation circuit; and the controller is configured to control the operation units of X rows and Y columns to store operation results of the vector multiplication circuit and the addition circuit into the third memory. 6. The matrix multiplier according to claim 5 , wherein the matrix multiplier further comprises: a fourth memory connected to the first memory and the second memory, and a fifth memory connected to the third memory; and the controller is further configured to control: before performing a multiplication operation on the first matrix and the second matrix, data sources of the first matrix and the second matrix to be moved from the fourth memory to the first memory and the second memory respectively, and the computation results to be moved from the third memory to the fifth memory. 7. The matrix multiplier according to claim 1 , wherein the vector multiplication circuit comprises L multipliers, and the addition circuit comprises an adder tree with an input quantity being L+1. 8. The matrix multiplier according to claim 1 , wherein the first memory, the second memory, the operation circuit, and the controller are connected by using a bus interface unit. 9. The matrix multiplier according to claim 1 , wherein S = { M / X , M % X = 0 [ M X ] + 1 , M % X ≠ 0 , and R = { K / L ,
Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title
in parallel-parallel fashion, i.e. both operands being entered in parallel (G06F7/533 takes precedence) · CPC title
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.