Unified integer and floating-point compare circuitry
US-2017357506-A1 · Dec 14, 2017 · US
US10474458B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10474458-B2 |
| Application number | US-201715787129-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 18, 2017 |
| Priority date | Apr 28, 2017 |
| Publication date | Nov 12, 2019 |
| Grant date | Nov 12, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One embodiment provides for a machine-learning hardware accelerator comprising a compute unit having an adder and a multiplier that are shared between integer data path and a floating-point datapath, the upper bits of input operands to the multiplier to be gated during floating-point operation.
Opening claim text (preview).
What is claimed is: 1. A method comprising: fetching and decoding a single instruction to perform a combined multiply and add operation on a set of operands; issuing the single instruction for execution by a dynamically configurable compute unit; configuring the dynamically configurable compute unit to perform operations at a precision and data-type of the set of operands; and executing at least a portion of the single instruction at the dynamically configurable compute unit to generate an output based on the combined multiply and add operation, wherein to generate the output includes quantizing an intermediate value having a first precision to a second precision that is lower than the first precision, the quantizing including stochastically rounding a fractional portion of intermediate data. 2. The method as in claim 1 , wherein the combined multiply and add operation is a fused multiply-add or a fused multiply-accumulate operation. 3. The method as in claim 1 , additionally comprising quantizing the intermediate value via a machine learning accelerator unit. 4. The method as in claim 3 , additionally comprising stochastically rounding the fractional portion of intermediate data based on output of a random number generator. 5. The method as in claim 3 , additionally comprising stochastically rounding the fractional portion of intermediate data based on a probability distribution associated with intermediate data. 6. A hardware accelerator comprising: a fetch unit to fetch a single instruction to perform a combined multiply and add operation on a set of operands; a decode unit to decode the single instruction into a decoded instruction; and a dynamically configurable compute unit to perform operations at a precision and data-type of the set of operands, wherein the dynamically configurable compute unit is to receive the decoded instruction and execute at least a portion of the decoded instruction, wherein to execute at least the portion of the decoded instruction includes to generate an output based on the combined multiply and add operation, to generate the output includes to quantize an intermediate value having a first precision to a second precision that is lower than the first precision, and to quantize the intermediate value includes to stochastically round a fractional portion of intermediate data. 7. The hardware accelerator as in claim 6 , wherein the combined multiply and add operation is a fused multiply-add or a fused multiply-accumulate operation. 8. The hardware accelerator as in claim 6 , wherein the dynamically configurable compute unit is to quantize the intermediate value via a machine learning accelerator unit. 9. The hardware accelerator as in claim 8 , wherein the dynamically configurable compute unit is to stochastically round the fractional portion of intermediate data based on output of a random number generator. 10. The hardware accelerator as in claim 8 , wherein the dynamically configurable compute unit is to stochastically round the fractional portion of intermediate data based on a probability distribution associated with intermediate data. 11. A non-transitory machine-readable medium storing instructions to cause one or more processors of an electronic device to perform operations comprising: configuring a hardware accelerator to fetch and decode a single instruction to perform a combined multiply and add operation on a set of operands and issue the single instruction for execution by a dynamically configurable compute unit; configuring the dynamically configurable compute unit to perform operations at a precision and data-type of the set of operands; and executing at least a portion of the single instruction at the dynamically configurable compute unit to generate an output based on the combined multiply and add operation, wherein to generate the output includes quantizing an intermediate value having a first precision to a second precision that is lower than the first precision, the quantizing including stochastically rounding a fractional portion of intermediate data. 12. The non-transitory machine-readable medium as in claim 11 , wherein the combined multiply and add operation is a fused multiply-add or a fused multiply-accumulate operation. 13. The non-transitory machine-readable medium as in claim 11 , the operations additionally comprising quantizing the intermediate value via the hardware accelerator. 14. The non-transitory machine-readable medium as in claim 13 , the operations additionally comprising stochastically rounding the fractional portion of intermediate data based on output of a random number generator. 15. The non-transitory machine-readable medium as in claim 13 , the operations additionally comprising stochastically rounding the fractional portion of intermediate data based on a probability distribution associated with intermediate data. 16. A data processing system comprising: a memory device; and a hardware accelerator coupled with the memory device, wherein the hardware accelerator includes: a fetch unit to fetch a single instruction to perform a combined multiply and add operation on a set of operands; a decode unit to decode the single instruction into a decoded instruction; and a dynamically configurable compute unit to perform operations at a precision and data-type of the set of operands, wherein the dynamically configurable compute unit is to receive the decoded instruction and execute at least a portion of the decoded instruction, wherein to execute at least the portion of the decoded instruction includes to generate an output based on the combined multiply and add operation, to generate the output includes to quantize an intermediate value having a first precision to a second precision that is lower than the first precision, and to quantize the intermediate value includes to stochastically round a fractional portion of intermediate data. 17. The data processing system as in claim 16 , wherein the combined multiply and add operation is a fused multiply-add or a fused multiply-accumulate operation. 18. The data processing system as in claim 16 , wherein the dynamically configurable compute unit is to quantize the intermediate value via a machine learning accelerator unit. 19. The data processing system as in claim 18 , wherein the dynamically configurable compute unit is to stochastically round the fractional portion of intermediate data based on output of a random number generator. 20. The data processing system as in claim 18 , wherein the dynamically configurable compute unit is to stochastically round the fractional portion of intermediate data based on a probability distribution associated with intermediate data.
Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations {(G06F7/49, G06F7/491 take precedence)} · CPC title
Combinations of networks · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
using electronic means · CPC title
with variable precision · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.