Apparatus and method for converting a floating-point value from half precision to single precision
US-10684854-B2 · Jun 16, 2020 · US
US11275560B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11275560-B2 |
| Application number | US-202016795097-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 19, 2020 |
| Priority date | Feb 19, 2020 |
| Publication date | Mar 15, 2022 |
| Grant date | Mar 15, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A floating-point number in a first format representation is received. Based on an identification of a floating-point format type of the floating-point number, different components of the first format representation are identified. The different components of the first format representation are placed in corresponding components of a second format representation of the floating-point number, wherein a total number of bits of the second format representation is larger than a total number of bits of the first format representation. At least one of the components of the second format representation is padded with one or more zero bits. The floating-point number in the second format representation is stored in a register. A multiplication using the second format representation of the floating-point number is performed.
Opening claim text (preview).
What is claimed is: 1. A device, comprising: a multiplication unit configured to: receive a floating-point number in a first format representation; based on an identification of a floating-point format type of the floating-point number, identify different components of the first format representation; place the different components of the first format representation in corresponding components of a second format representation of the floating-point number, wherein: a total number of bits of the second format representation is larger than a total number of bits of the first format representation; and a number of bits of a component of the second format representation is equal to a number of bits of a corresponding component of the first format representation; pad at least one of the components of the second format representation with one or more zero bits; and perform a multiplication using the second format representation of the floating-point number; and a register configured to store the floating-point number in the second format representation prior to the multiplication. 2. The device of claim 1 , wherein the first format representation comprises a sign bit, five exponent bits, and ten mantissa bits. 3. The device of claim 1 , wherein the first format representation comprises a sign bit, eight exponent bits, and seven mantissa bits. 4. The device of claim 1 , wherein the identification of the floating-point format type includes a flag that specifies the floating-point format type. 5. The device of claim 1 , wherein the multiplication unit is further configured to receive a multiply operation instruction that specifies the floating-point format type. 6. The device of claim 1 , wherein the different components of the first format representation include a sign bit component, an exponent bits component, and a mantissa bits component. 7. The device of claim 1 , wherein the total number of bits of the first format representation is sixteen. 8. The device of claim 1 , wherein the total number of bits of the second format representation is at least nineteen. 9. The device of claim 1 , wherein the second format representation comprises a sign bit, eight exponent bits, and ten mantissa bits. 10. The device of claim 1 , wherein the multiplication unit is configured to pad at least one of the components of the second format representation with one or more zero bits including by being configured to place one or more zeros in one or more exponent bit locations of the second format representation in response to a determination that the floating-point format type of the floating-point number is half precision binary floating-point format. 11. The device of claim 1 , wherein the multiplication unit is configured to pad at least one of the components of the second format representation with one or more zero bits including by being configured to place one or more zeros in one or more mantissa bit locations of the second format representation in response to a determination that the floating-point format type of the floating-point number is Brain Floating Point format. 12. The device of claim 1 , wherein the multiplication unit is configured to perform the multiplication using the second format representation of the floating-point number including by being configured to add an exponent component of the second format representation to an exponent component of a different floating-point number. 13. The device of claim 1 , wherein the multiplication unit is configured to perform the multiplication using the second format representation of the floating-point number including by being configured to multiply a mantissa component of the second format representation with a mantissa component of a different floating-point number. 14. The device of claim 1 , wherein the multiplication unit is further configured to provide an output of the multiplication. 15. The device of claim 14 , wherein different components of the output of the multiplication are placed in corresponding components of a third format representation of an output floating-point number, wherein a total number of bits of the third format representation is larger than the total number of bits of the second format representation. 16. The device of claim 14 , wherein the output of the multiplication includes a floating-point number in a single-precision floating-point format. 17. The device of claim 1 , wherein the multiplication is a part of an artificial neural network operation. 18. The device of claim 1 , wherein the multiplication is a part of a plurality of multiplications associated with a vector multiplication or a dot product operation. 19. A method, comprising: receiving a floating-point number in a first format representation; based on an identification of a floating-point format type of the floating-point number, identifying different components of the first format representation; placing the different components of the first format representation in corresponding components of a second format representation of the floating-point number, wherein: a total number of bits of the second format representation is larger than a total number of bits of the first format representation; and a number of bits of a component of the second format representation is equal to a number of bits of a corresponding component of the first format representation; padding at least one of the components of the second format representation with one or more zero bits; storing the floating-point number in the second format representation in a register; and performing a multiplication using the second format representation of the floating-point number. 20. A device, comprising: an arithmetic unit configured to: receive a floating-point number in a first format representation; based on an identification of a floating-point format type of the floating-point number, identify different components of the first format representation; place the different components of the first format representation in corresponding components of a second format representation of the floating-point number, wherein: a total number of bits of the second format representation is larger than a total number of bits of the first format representation; and a number of bits of a component of the second format representation is equal to a number of bits of a corresponding component of the first format representation; pad at least one of the components of the second format representation with one or more zero bits; and perform an arithmetic operation using the second format representation of the floating-point number; and a register configured to store the floating-point number in the second format representation prior to the arithmetic operation.
Accepting numbers of variable word length · CPC title
Multiplying · CPC title
Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title
Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations {(G06F7/49, G06F7/491 take precedence)} · CPC title
Reconfigurable for different fixed word lengths · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.