Outlier quantization for training and inference

US11574239B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11574239-B2
Application numberUS-201916357192-A
CountryUS
Kind codeB2
Filing dateMar 18, 2019
Priority dateMar 18, 2019
Publication dateFeb 7, 2023
Grant dateFeb 7, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Machine learning may include training and drawing inference from artificial neural networks, processes which may include performing convolution and matrix multiplication operations. Convolution and matrix multiplication operations are performed using vectors of block floating-point (BFP) values that may include outliers. BFP format stores floating-point values using a plurality of mantissas of a fixed bit width and a shared exponent. Elements are outliers when they are too large to be represented precisely with the fixed bit width mantissa and shared exponent. Outlier values are split into two mantissas. One mantissa is stored in the vector with non-outliers, while the other mantissa is stored outside the vector. Operations, such as a dot product, may be performed on the vectors in part by combining the in-vector mantissa and exponent of an outlier value with the out-of-vector mantissa and exponent.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: performing a machine learning operation while training an artificial neural network, wherein the machine learning operation includes performing an operation using a vector of floating-point values, wherein the floating-point values are represented by a plurality of mantissas and a shared exponent, wherein at least one of the mantissas represents part of an outlier value, and wherein another part of the outlier value is represented by an additional mantissa stored outside of the vector, wherein performing the operation includes performing a sub-operation using the outlier value, and wherein performing the sub-operation using the outlier value includes using the mantissa and the additional mantissa in conjunction with a corresponding value from a second vector of floating-point values. 2. The computer-implemented method of claim 1 , wherein the operation comprises a dot product applied to the vector of floating-point values and the second vector of floating-point values, and wherein the sub operation includes multiplying an element of the vector of floating-point values with a corresponding element of the second vector of floating-point values. 3. The computer-implemented method of claim 1 , wherein the operation is used to perform matrix multiplication or a convolution operation. 4. The computer-implemented method of claim 3 , wherein the matrix multiplication or the convolution operation is used to train the artificial neural network. 5. The computer-implemented method of claim 3 , wherein the matrix multiplication or the convolution operation is used to draw an inference from the artificial neural network. 6. The computer-implemented method of claim 1 , wherein the plurality of mantissas have a defined bit width. 7. The computer-implemented method of claim 6 , wherein an element of the vector of floating-point values is determined to be an outlier value when it cannot be stored with a defined number of digits of precision using the defined bit width and the shared exponent. 8. The computer-implemented method of claim 1 , wherein the outlier value is additionally represented with an additional exponent that is associated with the additional mantissa. 9. A computer-implemented method for performing a training or inference operation over an artificial neural network, comprising: performing a machine learning operation while training the artificial neural network, wherein performing the machine learning operation includes performing a dot product operation using a first vector of floating-point values and a second vector of floating-point values, wherein at least one of the first vector of floating-point values comprises an outlier value, and wherein the outlier value is stored by splitting a full precision mantissa of the outlier value into a first mantissa portion that is stored in the first vector and a second mantissa portion that is stored outside of the first vector, wherein performing the dot product includes: summing the products of corresponding elements in the first and second vectors; and adding, to the sum, the product of the second mantissa portion, the element in the corresponding position of the second vector, and two raised to an exponent that is derived from the split of the full precision mantissa. 10. The computer-implemented method of claim 9 , wherein the first mantissa portion is associated with the second mantissa portion by storing an index of the first mantissa portion proximate to second mantissa portion. 11. The computer-implemented method of claim 9 , wherein the second mantissa portion represents the most significant bits of the outlier value and the first mantissa portion represents the least significant bits of the outlier value. 12. The computer-implemented method of claim 9 , wherein the first vector is associated with a shared exponent, wherein a decimal point of the full precision mantissa is shifted to use the shared exponent, and wherein the second mantissa is created based on digits to the left of a first digit to the left of the decimal point. 13. The computer-implemented method of claim 12 , wherein the first mantissa is created at least based on the first digit to the left of the decimal point and digits to the right of the decimal point. 14. The computer-implemented method of claim 9 , wherein a common exponent is determined for the first vector, wherein decimal points of full precision mantissas be stored in the first vector are shifted to use the common exponent, and wherein any portion of the mantissas to the left of the first digit to the left of the decimal point are stored in a corresponding second mantissa outside of the first vector. 15. The computer-implemented method of claim 14 , wherein the number of bits that a full precision mantissa was shifted is stored proximate to the second mantissa. 16. The computer-implemented method of claim 15 , wherein performing the dot product includes concatenating the second mantissa to the first mantissa, summing the shared exponent and the number of bits that the full precision mantissa was shifted, multiplying the concatenated second mantissa and first mantissa by two raised to the power of the sum, and multiplying the result by the corresponding value in the second vector. 17. The computer-implemented method of claim 15 , wherein performing the dot product includes summing: the product of the first mantissa, two raised to the power of the shared exponent, and a corresponding value in the second vector; and the product of the second mantissa, two raised to the power of the shared exponent added to the number of bits that the full precision mantissa was shifted, and the corresponding value in the second vector. 18. A computing device, comprising: one or more processors; and at least one computer storage media having computer-executable instructions stored thereupon which, when executed by the one or more processors, will cause the computing device to: perform a machine learning operation while training an artificial neural network, wherein performing the machine learning operation includes performing an operation using a vector of floating-point values, wherein the floating-point values are represented by a plurality of mantissas and a shared exponent, wherein the plurality of mantissas are derived from a plurality of processor-native floating-point representations, wherein at least one of the mantissas represents part of an outlier value, and wherein another part of the outlier value is represented by an additional mantissa stored outside of the vector; and wherein performing the operation includes performing a sub-operation using the outlier value, and wherein performing the sub-operation using the outlier value includes using the mantissa and the additional mantissa. 19. The computing device of claim 18 , wherein the plurality of mantissas are derived from the plurality of processor-native floating-point representations by shifting the decimal point of the processor-native floating-point representations to use the shared exponent, and storing, as one of the plurality of mantissas, up to a defined bit width, the first digit to the left of the decimal point and the digits to the right of the decimal point. 20. The computing device of claim 18 , wherein the shared exponent is determined based in part on a median exponent value of the plurality of processor-native floating-point representations.

Assignees

Inventors

Classifications

  • Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title

  • Correlation function computation {including computation of convolution operations (arithmetic circuits for sum of products per se, e.g. multiply-accumulators G06F7/5443; digital filters, e.g. FIR, IIR, adaptive filters H03H17/00)} · CPC title

  • from or to individual record carriers, e.g. punched card {, memory card, integrated circuit [IC] card or smart card} · CPC title

  • Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11574239B2 cover?
Machine learning may include training and drawing inference from artificial neural networks, processes which may include performing convolution and matrix multiplication operations. Convolution and matrix multiplication operations are performed using vectors of block floating-point (BFP) values that may include outliers. BFP format stores floating-point values using a plurality of mantissas of …
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).