Shifter implemented circulant permutation matrix operations
US-2024386072-A1 · Nov 21, 2024 · US
US9912349B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9912349-B1 |
| Application number | US-201715628455-A |
| Country | US |
| Kind code | B1 |
| Filing date | Jun 20, 2017 |
| Priority date | Mar 20, 2017 |
| Publication date | Mar 6, 2018 |
| Grant date | Mar 6, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure provides a method and apparatus for processing a floating point number matrix, an apparatus and a computer readable storage medium. In embodiments of the present disclosure, the minimum value of the floating point number model matrix and the maximum value of the floating point number model matrix are obtained according to a floating point number model matrix to be compressed, and then, compression processing is performed for the floating point number model matrix to obtain the fixed point number model matrix according to the bit width, the minimum value of the floating point number model matrix and the maximum value of the floating point number model matrix. The compression processing is performed for the floating point number model matrix of the deep learning model by a fixed point method, to obtain the fixed point number model matrix and reduce the storage space and amount of operation of the deep learning model. Meanwhile, the present disclosure proposes a framework for implementing the apparatus in the deep learning network to maximize the deep learning network precision, that is, a multiplication portion of the matrix uses the apparatus, and operations of other portions such as activation function retain the floating point operation.
Opening claim text (preview).
What is claimed is: 1. A method of processing a floating point number matrix, executed by a computer, wherein the method comprises: according to a floating point number model matrix to be compressed, obtaining a minimum value of the floating point number model matrix and a maximum value of the floating point number model matrix; according to the bit width, the minimum value of the floating point number model matrix and the maximum value of the floating point number model matrix, performing compression processing for the floating point number model matrix to obtain a fixed point number model matrix, to reduce the storage space and amount of operation. 2. The method according to claim 1 , wherein, according to a floating point number model matrix to be compressed, obtaining a minimum value of the floating point number model matrix and a maximum value of the floating point number model matrix comprises: performing limit solution processing for all elements of the floating point number model matrix to obtain a minimum value of the floating point number model matrix and a maximum value of the floating point number model matrix; or performing limit solution processing for each column of elements of the floating point number model matrix to obtain a minimum value of the column of element and a maximum value of the column of elements; enabling the minimum value of each column of elements of the floating point number model matrix to form a minimum value vector as the minimum value of the floating point number model matrix, and enabling the maximum value of each column of elements of the floating point number model matrix to form a maximum value vector as the maximum value of the floating point number model matrix. 3. The method according to claim 1 , wherein, according to the bit width, the minimum value of the floating point number model matrix and the maximum value of the floating point number model matrix, performing compression processing for the floating point number model matrix to obtain a fixed point number model matrix comprises: according to the bit width, the minimum value of the floating point number model matrix and the maximum value of the floating point number model matrix, performing compression processing for the floating point number model matrix using the formula X int =2 K *(X−X Min )/(X Max −X Min ), to obtain the fixed point number model matrix; wherein, X int is an element in the fixed point number model matrix; X is an element in the floating point number model matrix; K is the bit width; X Min is a minimum value of the floating point number model matrix; and X Max is a maximum value of the floating point number model matrix. 4. The method according to claim 1 , wherein the method further comprises: according to a floating point number input matrix to be compressed, obtaining a minimum value of the floating point number input matrix and a maximum value of the floating point number input matrix; according to the bit width, the minimum value of the floating point number input matrix and the maximum value of the floating point number input matrix, performing compression processing for the floating point number input matrix to obtain a fixed point number input matrix. 5. The method according to claim 1 , wherein the method further comprises: according to the fixed point number input matrix and the fixed point number model matrix, obtaining a fixed point number output matrix from multiplication of the floating point number input matrix and the floating point number model matrix; according to the fixed point number input matrix, the minimum value of the floating point number input matrix and the maximum value of the floating point number input matrix, and the fixed point number model matrix, the minimum value of the floating point number model matrix and the maximum value of the floating point number model matrix, performing decompression processing for the floating point number output matrix to obtain a floating point number output matrix. 6. The method according to claim 5 , wherein, according to the fixed point number input matrix, the minimum value of the floating point number input matrix and the maximum value of the floating point number input matrix, and the fixed point number model matrix, the minimum value of the floating point number model matrix and the maximum value of the floating point number model matrix, performing decompression processing for the floating point number output matrix to obtain a floating point number output matrix comprises: according to the fixed point number input matrix, the minimum value of the floating point number input matrix and the maximum value of the floating point number input matrix, and the fixed point number model matrix, the minimum value of the floating point number model matrix and the maximum value of the floating point number model matrix, performing decompression processing for the floating point number output matrix using the formula Xvec*Yvec=α*β*Xvec int *Yvec int +Y min *ΣX int +X min *ΣY int +N*X Min *Y Min , to obtain the floating point number output matrix; wherein, Xvec is a column vector of the floating point number model matrix; Yvec is a row vector of the floating point number input matrix; Xvec*Yvec is the floating point number output matrix; N is the number of elements in the column vector of the floating point number model matrix, or the number of elements in the row vector of the floating point number input matrix; α=(X Max −X Min )/2 k , K is the bit width, X Min is the minimum value of the floating point number model matrix, and X Max is the maximum value of the floating point number model matrix; β=(Y Max −Y Min )/2 k , Y Min is the minimum value of the floating point number input matrix, and Y Max is the maximum value of the floating point number input matrix; Xvec int is a column vector of the fixed point number model matrix; Yvec int is a row vector of the fixed point number input matrix; ΣX int is a sum of elements in the row vector of the fixed point number model matrix; and ΣY int is a sum of elements in the column vector of the fixed point number input matrix. 7. An apparatus, wherein the apparatus comprises: one or more processors; a memory storing instructions, which when executed by the at least one processor, cause the at least one processor to perform operation, the operation comprising: according to a floating point number model matrix to be compressed, obtaining a minimum value of the floating point number model matrix and a maximum value of the floating point number model matrix; according to the bit width, the minimum value of the floating point number model matrix and the maximum value of the floating point number model matrix, performing compression processing for the floating point number model matrix to obtain a fixed point number model matrix. 8. The apparatus according to claim 7 , wherein, the operation of according to a floating point number model matrix to be compressed, obtaining a minimum value of the floating point number model matrix and a maximum value of the floating point number model matrix comprises: performing limit solution processing for all elements of the floating point number model matrix to obtain a minimum value of the floating point number model matrix and a maximum value of the floating point number model matrix; or performing limit solution processing for each column of elements of the floating point number model matrix to obtain a minimum value of the column of element and a maximum value of the column of elements; enabling the minimum value of each column of elements of the floating point number model matrix to form a minimum value vector as the minimum value of the floating point number m
Recurrent networks, e.g. Hopfield networks · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
Bit or string instructions · CPC title
from multiple instruction streams, e.g. multistreaming · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.