Method and apparatus for processing floating point number matrix, an apparatus and computer-readable storage medium

US9912349B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9912349-B1
Application numberUS-201715628455-A
CountryUS
Kind codeB1
Filing dateJun 20, 2017
Priority dateMar 20, 2017
Publication dateMar 6, 2018
Grant dateMar 6, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure provides a method and apparatus for processing a floating point number matrix, an apparatus and a computer readable storage medium. In embodiments of the present disclosure, the minimum value of the floating point number model matrix and the maximum value of the floating point number model matrix are obtained according to a floating point number model matrix to be compressed, and then, compression processing is performed for the floating point number model matrix to obtain the fixed point number model matrix according to the bit width, the minimum value of the floating point number model matrix and the maximum value of the floating point number model matrix. The compression processing is performed for the floating point number model matrix of the deep learning model by a fixed point method, to obtain the fixed point number model matrix and reduce the storage space and amount of operation of the deep learning model. Meanwhile, the present disclosure proposes a framework for implementing the apparatus in the deep learning network to maximize the deep learning network precision, that is, a multiplication portion of the matrix uses the apparatus, and operations of other portions such as activation function retain the floating point operation.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of processing a floating point number matrix, executed by a computer, wherein the method comprises: according to a floating point number model matrix to be compressed, obtaining a minimum value of the floating point number model matrix and a maximum value of the floating point number model matrix; according to the bit width, the minimum value of the floating point number model matrix and the maximum value of the floating point number model matrix, performing compression processing for the floating point number model matrix to obtain a fixed point number model matrix, to reduce the storage space and amount of operation. 2. The method according to claim 1 , wherein, according to a floating point number model matrix to be compressed, obtaining a minimum value of the floating point number model matrix and a maximum value of the floating point number model matrix comprises: performing limit solution processing for all elements of the floating point number model matrix to obtain a minimum value of the floating point number model matrix and a maximum value of the floating point number model matrix; or performing limit solution processing for each column of elements of the floating point number model matrix to obtain a minimum value of the column of element and a maximum value of the column of elements; enabling the minimum value of each column of elements of the floating point number model matrix to form a minimum value vector as the minimum value of the floating point number model matrix, and enabling the maximum value of each column of elements of the floating point number model matrix to form a maximum value vector as the maximum value of the floating point number model matrix. 3. The method according to claim 1 , wherein, according to the bit width, the minimum value of the floating point number model matrix and the maximum value of the floating point number model matrix, performing compression processing for the floating point number model matrix to obtain a fixed point number model matrix comprises: according to the bit width, the minimum value of the floating point number model matrix and the maximum value of the floating point number model matrix, performing compression processing for the floating point number model matrix using the formula X int =2 K *(X−X Min )/(X Max −X Min ), to obtain the fixed point number model matrix; wherein, X int is an element in the fixed point number model matrix; X is an element in the floating point number model matrix; K is the bit width; X Min is a minimum value of the floating point number model matrix; and X Max is a maximum value of the floating point number model matrix. 4. The method according to claim 1 , wherein the method further comprises: according to a floating point number input matrix to be compressed, obtaining a minimum value of the floating point number input matrix and a maximum value of the floating point number input matrix; according to the bit width, the minimum value of the floating point number input matrix and the maximum value of the floating point number input matrix, performing compression processing for the floating point number input matrix to obtain a fixed point number input matrix. 5. The method according to claim 1 , wherein the method further comprises: according to the fixed point number input matrix and the fixed point number model matrix, obtaining a fixed point number output matrix from multiplication of the floating point number input matrix and the floating point number model matrix; according to the fixed point number input matrix, the minimum value of the floating point number input matrix and the maximum value of the floating point number input matrix, and the fixed point number model matrix, the minimum value of the floating point number model matrix and the maximum value of the floating point number model matrix, performing decompression processing for the floating point number output matrix to obtain a floating point number output matrix. 6. The method according to claim 5 , wherein, according to the fixed point number input matrix, the minimum value of the floating point number input matrix and the maximum value of the floating point number input matrix, and the fixed point number model matrix, the minimum value of the floating point number model matrix and the maximum value of the floating point number model matrix, performing decompression processing for the floating point number output matrix to obtain a floating point number output matrix comprises: according to the fixed point number input matrix, the minimum value of the floating point number input matrix and the maximum value of the floating point number input matrix, and the fixed point number model matrix, the minimum value of the floating point number model matrix and the maximum value of the floating point number model matrix, performing decompression processing for the floating point number output matrix using the formula Xvec*Yvec=α*β*Xvec int *Yvec int +Y min *ΣX int +X min *ΣY int +N*X Min *Y Min , to obtain the floating point number output matrix; wherein, Xvec is a column vector of the floating point number model matrix; Yvec is a row vector of the floating point number input matrix; Xvec*Yvec is the floating point number output matrix; N is the number of elements in the column vector of the floating point number model matrix, or the number of elements in the row vector of the floating point number input matrix; α=(X Max −X Min )/2 k , K is the bit width, X Min is the minimum value of the floating point number model matrix, and X Max is the maximum value of the floating point number model matrix; β=(Y Max −Y Min )/2 k , Y Min is the minimum value of the floating point number input matrix, and Y Max is the maximum value of the floating point number input matrix; Xvec int is a column vector of the fixed point number model matrix; Yvec int is a row vector of the fixed point number input matrix; ΣX int is a sum of elements in the row vector of the fixed point number model matrix; and ΣY int is a sum of elements in the column vector of the fixed point number input matrix. 7. An apparatus, wherein the apparatus comprises: one or more processors; a memory storing instructions, which when executed by the at least one processor, cause the at least one processor to perform operation, the operation comprising: according to a floating point number model matrix to be compressed, obtaining a minimum value of the floating point number model matrix and a maximum value of the floating point number model matrix; according to the bit width, the minimum value of the floating point number model matrix and the maximum value of the floating point number model matrix, performing compression processing for the floating point number model matrix to obtain a fixed point number model matrix. 8. The apparatus according to claim 7 , wherein, the operation of according to a floating point number model matrix to be compressed, obtaining a minimum value of the floating point number model matrix and a maximum value of the floating point number model matrix comprises: performing limit solution processing for all elements of the floating point number model matrix to obtain a minimum value of the floating point number model matrix and a maximum value of the floating point number model matrix; or performing limit solution processing for each column of elements of the floating point number model matrix to obtain a minimum value of the column of element and a maximum value of the column of elements; enabling the minimum value of each column of elements of the floating point number model matrix to form a minimum value vector as the minimum value of the floating point number m

Assignees

Inventors

Classifications

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • G06F17/16Primary

    Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • Bit or string instructions · CPC title

  • from multiple instruction streams, e.g. multistreaming · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9912349B1 cover?
The present disclosure provides a method and apparatus for processing a floating point number matrix, an apparatus and a computer readable storage medium. In embodiments of the present disclosure, the minimum value of the floating point number model matrix and the maximum value of the floating point number model matrix are obtained according to a floating point number model matrix to be compres…
Who is the assignee on this patent?
Beijing Baidu Netcom Sci & Tec
What technology area does this patent fall under?
Primary CPC classification G06F17/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 06 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).