Convolution acceleration and computing processing method and apparatus, electronic device, and storage medium
US-2020057938-A1 · Feb 20, 2020 · US
US11803475B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11803475-B2 |
| Application number | US-201917640276-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 28, 2019 |
| Priority date | Sep 3, 2019 |
| Publication date | Oct 31, 2023 |
| Grant date | Oct 31, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present invention provides a method and apparatus for data caching. The method comprises: output matrixes are acquired one by one, a plurality of acquired output matrixes are written alternately into two queue sets of a first cache unit according to a sequence in which the output matrixes are acquired, and the output matrixes stored line by line in a first cache unit are written into a second cache unit one by one, according to the sequence in which the output matrixes are written into the second cache unit, valid data of each output matrix of the second cache unit is determined one by one according to preset parameters, and the valid data of each output matrix is written into a third cache unit, and the valid data of the output matrixes stored in the third cache unit are configured to be sequentially written into a memory according to a sequence in which the valid data are written into the third cache unit. In the present solution, the output matrixes are cached by using cache units with the writing speed matching with the computing speed of a processor, and the output matrixes are completely written into a memory one by one according to a sequence of generation time. Therefore, the present invention may solve the problem that the computing speed of the processor does not match with the writing speed of the memory.
Opening claim text (preview).
The invention claimed is: 1. A method for data caching, comprising: acquiring an output matrix from a processor, wherein the output matrix is an N-order matrix, and N is a positive integer; respectively writing N rows of data of the output matrix into N first-level cache queues of a target queue set of a first cache unit; wherein the first cache unit is preconfigured with two queue sets, the target queue set is the queue set that is not used to store a previous output matrix of the output matrix in the two queue sets; and the writing speed of the first cache unit matches with the computing speed of the processor; after the previous output matrix of the output matrix stored in the first cache unit is written into a second cache unit, writing the data of the output matrix stored in the target queue set into the second cache unit line by line, so as to write the output matrix into the second cache unit; wherein the writing speed of the second cache unit matches with the computing speed of the processor; and after valid data of the previous output matrix of the output matrix stored in the second cache unit is written into a third cache unit, determining valid data in the output matrix according to preset parameters, and writing the valid data of the output matrix into the third cache unit; wherein the valid data of a plurality of output matrixes in the third cache unit is configured to be sequentially written into a memory in a sequence in which the output matrixes are acquired, and wherein the writing speed of the third cache unit matches with the computing speed of the processor. 2. The method according to claim 1 , wherein, the output matrix is an output matrix obtained by convolution computation using a two-dimensional systolic array during the computing process of a convolutional neural network; before respectively writing N rows of data of the output matrix into N first-level cache queues of a target queue set of a first cache unit, the method further comprises: rearranging the data matrix according to a preset data storage sequence, to obtain an output matrix after rearranging; respectively writing N rows of data of the output matrix into N first-level cache queues of a target queue set of a first cache unit comprises: respectively writing N rows of data of an output matrix after rearranging into N first-level cache queues of a target queue set of a first cache unit. 3. The method according to claim 1 , wherein, the method further comprises the following step before respectively writing N rows of data of the output matrix into N first-level cache queues of a target queue set of a first cache unit: deleting redundant data of the output matrix, to obtain a filtered output matrix; respectively writing N rows of data of the output matrix into N first-level cache queues of a target queue set of a first cache unit comprises: writing the filtered output matrix into a target queue set of a first cache unit, wherein M rows of data of the filtered output matrix are respectively stored in M cache queues of the target queue set, wherein M is a positive integer less than or equal to N. 4. The method according to claim 1 , wherein, the output matrix is an output matrix obtained by convolution computation using a two-dimensional systolic array during the computing process of the convolutional neural network; determining the valid data in the output matrix according to preset parameters comprises: determining valid data in the output matrix according to a preset step size in the neural network. 5. The method according to claim 2 , wherein, the process of performing convolution computation by using a two-dimensional systolic array to obtain an output matrix comprises: splitting input data of a convolutional layer into a plurality of input matrixes; and performing convolution computation on the input matrix using a two-dimensional systolic array aiming at each input matrix, to obtain an output matrix corresponding to the input matrix. 6. The method according to claim 4 , wherein, the process of performing convolution computation by using a two-dimensional systolic array to obtain an output matrix comprises: splitting input data of a convolutional layer into a plurality of input matrixes; and performing convolution computation on the input matrix using a two-dimensional systolic array aiming at each input matrix, to obtain an output matrix corresponding to the input matrix.
Convolutional networks [CNN, ConvNet] · CPC title
Overlapped cache accessing, e.g. pipeline (G06F12/0846 takes precedence) · CPC title
with two or more cache hierarchy levels (with multilevel cache hierarchies G06F12/0811) · CPC title
Latency reduction · CPC title
Data buffering arrangements · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.