What technology area does this patent fall under?

Primary CPC classification G06F7/57. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 12 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Instructions and logic to perform floating-point and integer operations for machine learning

US10474458B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10474458-B2
Application number	US-201715787129-A
Country	US
Kind code	B2
Filing date	Oct 18, 2017
Priority date	Apr 28, 2017
Publication date	Nov 12, 2019
Grant date	Nov 12, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment provides for a machine-learning hardware accelerator comprising a compute unit having an adder and a multiplier that are shared between integer data path and a floating-point datapath, the upper bits of input operands to the multiplier to be gated during floating-point operation.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: fetching and decoding a single instruction to perform a combined multiply and add operation on a set of operands; issuing the single instruction for execution by a dynamically configurable compute unit; configuring the dynamically configurable compute unit to perform operations at a precision and data-type of the set of operands; and executing at least a portion of the single instruction at the dynamically configurable compute unit to generate an output based on the combined multiply and add operation, wherein to generate the output includes quantizing an intermediate value having a first precision to a second precision that is lower than the first precision, the quantizing including stochastically rounding a fractional portion of intermediate data. 2. The method as in claim 1 , wherein the combined multiply and add operation is a fused multiply-add or a fused multiply-accumulate operation. 3. The method as in claim 1 , additionally comprising quantizing the intermediate value via a machine learning accelerator unit. 4. The method as in claim 3 , additionally comprising stochastically rounding the fractional portion of intermediate data based on output of a random number generator. 5. The method as in claim 3 , additionally comprising stochastically rounding the fractional portion of intermediate data based on a probability distribution associated with intermediate data. 6. A hardware accelerator comprising: a fetch unit to fetch a single instruction to perform a combined multiply and add operation on a set of operands; a decode unit to decode the single instruction into a decoded instruction; and a dynamically configurable compute unit to perform operations at a precision and data-type of the set of operands, wherein the dynamically configurable compute unit is to receive the decoded instruction and execute at least a portion of the decoded instruction, wherein to execute at least the portion of the decoded instruction includes to generate an output based on the combined multiply and add operation, to generate the output includes to quantize an intermediate value having a first precision to a second precision that is lower than the first precision, and to quantize the intermediate value includes to stochastically round a fractional portion of intermediate data. 7. The hardware accelerator as in claim 6 , wherein the combined multiply and add operation is a fused multiply-add or a fused multiply-accumulate operation. 8. The hardware accelerator as in claim 6 , wherein the dynamically configurable compute unit is to quantize the intermediate value via a machine learning accelerator unit. 9. The hardware accelerator as in claim 8 , wherein the dynamically configurable compute unit is to stochastically round the fractional portion of intermediate data based on output of a random number generator. 10. The hardware accelerator as in claim 8 , wherein the dynamically configurable compute unit is to stochastically round the fractional portion of intermediate data based on a probability distribution associated with intermediate data. 11. A non-transitory machine-readable medium storing instructions to cause one or more processors of an electronic device to perform operations comprising: configuring a hardware accelerator to fetch and decode a single instruction to perform a combined multiply and add operation on a set of operands and issue the single instruction for execution by a dynamically configurable compute unit; configuring the dynamically configurable compute unit to perform operations at a precision and data-type of the set of operands; and executing at least a portion of the single instruction at the dynamically configurable compute unit to generate an output based on the combined multiply and add operation, wherein to generate the output includes quantizing an intermediate value having a first precision to a second precision that is lower than the first precision, the quantizing including stochastically rounding a fractional portion of intermediate data. 12. The non-transitory machine-readable medium as in claim 11 , wherein the combined multiply and add operation is a fused multiply-add or a fused multiply-accumulate operation. 13. The non-transitory machine-readable medium as in claim 11 , the operations additionally comprising quantizing the intermediate value via the hardware accelerator. 14. The non-transitory machine-readable medium as in claim 13 , the operations additionally comprising stochastically rounding the fractional portion of intermediate data based on output of a random number generator. 15. The non-transitory machine-readable medium as in claim 13 , the operations additionally comprising stochastically rounding the fractional portion of intermediate data based on a probability distribution associated with intermediate data. 16. A data processing system comprising: a memory device; and a hardware accelerator coupled with the memory device, wherein the hardware accelerator includes: a fetch unit to fetch a single instruction to perform a combined multiply and add operation on a set of operands; a decode unit to decode the single instruction into a decoded instruction; and a dynamically configurable compute unit to perform operations at a precision and data-type of the set of operands, wherein the dynamically configurable compute unit is to receive the decoded instruction and execute at least a portion of the decoded instruction, wherein to execute at least the portion of the decoded instruction includes to generate an output based on the combined multiply and add operation, to generate the output includes to quantize an intermediate value having a first precision to a second precision that is lower than the first precision, and to quantize the intermediate value includes to stochastically round a fractional portion of intermediate data. 17. The data processing system as in claim 16 , wherein the combined multiply and add operation is a fused multiply-add or a fused multiply-accumulate operation. 18. The data processing system as in claim 16 , wherein the dynamically configurable compute unit is to quantize the intermediate value via a machine learning accelerator unit. 19. The data processing system as in claim 18 , wherein the dynamically configurable compute unit is to stochastically round the fractional portion of intermediate data based on output of a random number generator. 20. The data processing system as in claim 18 , wherein the dynamically configurable compute unit is to stochastically round the fractional portion of intermediate data based on a probability distribution associated with intermediate data.

Assignees

Intel Corp

Inventors

Classifications

G06F7/57Primary
Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations {(G06F7/49, G06F7/491 take precedence)} · CPC title
G06N3/045
Combinations of networks · CPC title
G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/063
using electronic means · CPC title
G06F9/30014
with variable precision · CPC title

Patent family

Related publications grouped by family.

View patent family 61827531

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10474458B2 cover?: One embodiment provides for a machine-learning hardware accelerator comprising a compute unit having an adder and a multiplier that are shared between integer data path and a floating-point datapath, the upper bits of input operands to the multiplier to be gated during floating-point operation.
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification G06F7/57. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 12 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).