FP16-S7E8 mixed precision for deep learning and other algorithms

US11093579B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11093579-B2
Application numberUS-201816122030-A
CountryUS
Kind codeB2
Filing dateSep 5, 2018
Priority dateSep 5, 2018
Publication dateAug 17, 2021
Grant dateAug 17, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed embodiments relate to mixed-precision vector multiply-accumulate (MPVMAC) In one example, a processor includes fetch circuitry to fetch a compress instruction having fields to specify locations of a source vector having N single-precision formatted elements, and a compressed vector having N neural half-precision (NHP) formatted elements, decode circuitry to decode the fetched compress instruction, execution circuitry to respond to the decoded compress instruction by: converting each element of the source vector into the NHP format and writing each converted element to a corresponding compressed vector element, wherein the processor is further to fetch, decode, and execute a MPVMAC instruction to multiply corresponding NHP-formatted elements using a 16-bit multiplier, and accumulate each of the products with previous contents of a corresponding destination using a 32-bit accumulator.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: fetch circuitry to fetch a compress instruction having fields to specify locations of a source vector having N single-precision formatted elements, and a compressed vector having N neural half-precision (NHP) formatted elements; decode circuitry to decode the fetched compress instruction; execution circuitry to respond to the decoded compress instruction by: converting each element of the source vector into the NHP format; rounding each converted element according to a rounding mode; and writing each rounded element to a corresponding compressed vector element; wherein the NHP format comprises seven significand bits, and eight exponent bits; wherein the source and compressed vectors are each either in memory or in registers; wherein the fetch, decode, and execution circuitry are further to fetch, decode, and execute a second compress instruction specifying locations of a second source vector having N elements formatted according to the single-precision format, and a second compressed vector having N elements formatted according to the NHP format; wherein the fetch and decode circuitry is further to fetch and decode a mixed-precision vector multiply-accumulate (MPVMAC) instruction having fields to specify first and second source vectors having N NHP-formatted elements, and a destination vector having N single-precision-formatted elements; wherein the specified source vectors are the compressed vector and the second compressed vector; and wherein the execution circuitry is further to respond to the decoded MPVMAC instruction, for each of the N elements, by generating a 16-bit product of the compressed vector element and the second compressed vector element and accumulating the generated 16-bit product with previous contents of a corresponding element of the destination vector. 2. The processor of claim 1 , wherein the MPVMAC instruction further has a field to specify a writemask, the specified writemask comprising N bits, each bit to identify either when the corresponding element of the destination vector is unmasked and to be written with the generated 16-bit product, or when the corresponding element of the destination vector is mapped and is either to be zeroed or merged. 3. The processor of claim 1 , wherein the fetch circuitry is further to fetch an expand instruction having fields to specify locations of a destination vector having N elements formatted according to the single-precision format and the compressed vector; wherein the processor further comprises: decode circuitry to decode the fetched expand instruction; and execution circuitry to respond to the decoded expand instruction by: converting each element of the compressed vector into the single-precision format; and writing each converted element to a corresponding destination vector element. 4. The processor of claim 1 , wherein the single-precision format is a binary32 format standardized by the Institute of Electrical and Electronics Engineers (IEEE) as part of the IEEE 754-2008 standard. 5. The processor of claim 4 , wherein the rounding mode is specified by the IEEE 754 standard and is one of round to nearest with ties to even, round to nearest with ties away from zero, round toward zero, round toward positive infinity, and round toward negative infinity, and wherein the rounding mode is specified either on a per-instruction basis by an immediate value specified by the instruction, or on an embedded basis by a software-programmable control and status register. 6. The processor of claim 1 , wherein the specified source and compressed vectors each occupy one or rows of a matrix having M rows by N columns. 7. The processor of claim 1 , wherein the execution circuitry is further to perform rounding when converting, accumulating, and multiplying, according to the rounding mode. 8. The processor of claim 1 , wherein the rounding mode is one of round to nearest even, round toward negative infinity, round toward positive infinity and round toward zero, and wherein the rounding mode is specified either on a per-instruction basis by an immediate value specified by the instruction, or on an embedded basis by a software-programmable control and status register. 9. The processor of claim 1 , wherein the execution circuitry is further to perform saturation, as necessary, when accumulating and multiplying. 10. A method comprising: fetching, using fetch circuitry, a compress instruction having fields to specify locations of a source vector having N single-precision formatted elements, and a compressed vector having N neural half-precision (NHP) formatted elements; decoding, using decode circuitry, the fetched compress instruction; responding, using execution circuitry, to the decoded compress instruction by: converting each element of the source vector into the NHP format; rounding each converted element according to a rounding mode; writing each rounded element to a corresponding compressed vector element; wherein the NHP format comprises seven significand bits, and eight exponent bits; wherein the source and compressed vectors are each either in memory or in registers; fetching, decoding, and executing, using the fetch, decode, and execution circuitry, a second compress instruction specifying locations of a second source vector having N elements formatted according to the single-precision format, and a second compressed vector having N elements formatted according to the NHP format; fetching and decoding, using the fetch and decode circuitry, a mixed-precision vector multiply-accumulate (MPVMAC) instruction having fields to specify first and second source vectors having N NHP-formatted elements, and a destination vector having N single-precision-formatted elements, wherein the specified source vectors are the compressed vector and the second compressed vector; and responding, using the execution circuitry, to the decoded MPVMAC instruction, for each of the N elements, by generating a 16-bit product of the compressed vector element and the second compressed vector element, and accumulating the generated 16-bit product with previous contents of a corresponding element of the destination vector. 11. The method of claim 10 , wherein the MPVMAC instruction further has a field to specify a writemask, the specified writemask comprising N bits, each bit to identify either when the corresponding element of the destination vector is unmasked and to be written with the generated 16-bit product, or when the corresponding element of the destination vector is mapped and is either to be zeroed or merged. 12. The method of claim 10 , further comprising: fetching, using the fetch circuitry, an expand instruction having fields to specify locations of a destination vector having N elements formatted according to the single-precision format and the compressed vector; decoding, using decode circuitry, the fetched expand instruction; responding, using execution circuitry, to the decoded expand instruction by: converting each element of the compressed vector into the single-precision format; and writing each converted element to a corresponding destination vector element. 13. The method of claim 10 , wherein the single-precision format is a binary32 format standardized by the Institute of Electrical and Electronics Engineers (IEEE) as part of the IEEE 754-2008 standard. 14. The method of claim 13 , wherein the rounding mode is specified by the IEEE 754 standard and is one of round to nearest with ties to even, round to nearest with ties away from zero, round toward zero, round toward positive infin

Assignees

Inventors

Classifications

  • Learning methods · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • using a mask · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11093579B2 cover?
Disclosed embodiments relate to mixed-precision vector multiply-accumulate (MPVMAC) In one example, a processor includes fetch circuitry to fetch a compress instruction having fields to specify locations of a source vector having N single-precision formatted elements, and a compressed vector having N neural half-precision (NHP) formatted elements, decode circuitry to decode the fetched compress…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F17/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 17 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).