Instructions and logic to perform floating-point and integer operations for machine learning

US10474458B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10474458-B2
Application numberUS-201715787129-A
CountryUS
Kind codeB2
Filing dateOct 18, 2017
Priority dateApr 28, 2017
Publication dateNov 12, 2019
Grant dateNov 12, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment provides for a machine-learning hardware accelerator comprising a compute unit having an adder and a multiplier that are shared between integer data path and a floating-point datapath, the upper bits of input operands to the multiplier to be gated during floating-point operation.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: fetching and decoding a single instruction to perform a combined multiply and add operation on a set of operands; issuing the single instruction for execution by a dynamically configurable compute unit; configuring the dynamically configurable compute unit to perform operations at a precision and data-type of the set of operands; and executing at least a portion of the single instruction at the dynamically configurable compute unit to generate an output based on the combined multiply and add operation, wherein to generate the output includes quantizing an intermediate value having a first precision to a second precision that is lower than the first precision, the quantizing including stochastically rounding a fractional portion of intermediate data. 2. The method as in claim 1 , wherein the combined multiply and add operation is a fused multiply-add or a fused multiply-accumulate operation. 3. The method as in claim 1 , additionally comprising quantizing the intermediate value via a machine learning accelerator unit. 4. The method as in claim 3 , additionally comprising stochastically rounding the fractional portion of intermediate data based on output of a random number generator. 5. The method as in claim 3 , additionally comprising stochastically rounding the fractional portion of intermediate data based on a probability distribution associated with intermediate data. 6. A hardware accelerator comprising: a fetch unit to fetch a single instruction to perform a combined multiply and add operation on a set of operands; a decode unit to decode the single instruction into a decoded instruction; and a dynamically configurable compute unit to perform operations at a precision and data-type of the set of operands, wherein the dynamically configurable compute unit is to receive the decoded instruction and execute at least a portion of the decoded instruction, wherein to execute at least the portion of the decoded instruction includes to generate an output based on the combined multiply and add operation, to generate the output includes to quantize an intermediate value having a first precision to a second precision that is lower than the first precision, and to quantize the intermediate value includes to stochastically round a fractional portion of intermediate data. 7. The hardware accelerator as in claim 6 , wherein the combined multiply and add operation is a fused multiply-add or a fused multiply-accumulate operation. 8. The hardware accelerator as in claim 6 , wherein the dynamically configurable compute unit is to quantize the intermediate value via a machine learning accelerator unit. 9. The hardware accelerator as in claim 8 , wherein the dynamically configurable compute unit is to stochastically round the fractional portion of intermediate data based on output of a random number generator. 10. The hardware accelerator as in claim 8 , wherein the dynamically configurable compute unit is to stochastically round the fractional portion of intermediate data based on a probability distribution associated with intermediate data. 11. A non-transitory machine-readable medium storing instructions to cause one or more processors of an electronic device to perform operations comprising: configuring a hardware accelerator to fetch and decode a single instruction to perform a combined multiply and add operation on a set of operands and issue the single instruction for execution by a dynamically configurable compute unit; configuring the dynamically configurable compute unit to perform operations at a precision and data-type of the set of operands; and executing at least a portion of the single instruction at the dynamically configurable compute unit to generate an output based on the combined multiply and add operation, wherein to generate the output includes quantizing an intermediate value having a first precision to a second precision that is lower than the first precision, the quantizing including stochastically rounding a fractional portion of intermediate data. 12. The non-transitory machine-readable medium as in claim 11 , wherein the combined multiply and add operation is a fused multiply-add or a fused multiply-accumulate operation. 13. The non-transitory machine-readable medium as in claim 11 , the operations additionally comprising quantizing the intermediate value via the hardware accelerator. 14. The non-transitory machine-readable medium as in claim 13 , the operations additionally comprising stochastically rounding the fractional portion of intermediate data based on output of a random number generator. 15. The non-transitory machine-readable medium as in claim 13 , the operations additionally comprising stochastically rounding the fractional portion of intermediate data based on a probability distribution associated with intermediate data. 16. A data processing system comprising: a memory device; and a hardware accelerator coupled with the memory device, wherein the hardware accelerator includes: a fetch unit to fetch a single instruction to perform a combined multiply and add operation on a set of operands; a decode unit to decode the single instruction into a decoded instruction; and a dynamically configurable compute unit to perform operations at a precision and data-type of the set of operands, wherein the dynamically configurable compute unit is to receive the decoded instruction and execute at least a portion of the decoded instruction, wherein to execute at least the portion of the decoded instruction includes to generate an output based on the combined multiply and add operation, to generate the output includes to quantize an intermediate value having a first precision to a second precision that is lower than the first precision, and to quantize the intermediate value includes to stochastically round a fractional portion of intermediate data. 17. The data processing system as in claim 16 , wherein the combined multiply and add operation is a fused multiply-add or a fused multiply-accumulate operation. 18. The data processing system as in claim 16 , wherein the dynamically configurable compute unit is to quantize the intermediate value via a machine learning accelerator unit. 19. The data processing system as in claim 18 , wherein the dynamically configurable compute unit is to stochastically round the fractional portion of intermediate data based on output of a random number generator. 20. The data processing system as in claim 18 , wherein the dynamically configurable compute unit is to stochastically round the fractional portion of intermediate data based on a probability distribution associated with intermediate data.

Assignees

Inventors

Classifications

  • G06F7/57Primary

    Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations {(G06F7/49, G06F7/491 take precedence)} · CPC title

  • Combinations of networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • using electronic means · CPC title

  • with variable precision · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10474458B2 cover?
One embodiment provides for a machine-learning hardware accelerator comprising a compute unit having an adder and a multiplier that are shared between integer data path and a floating-point datapath, the upper bits of input operands to the multiplier to be gated during floating-point operation.
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F7/57. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 12 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).