Circuitry for low-precision deep learning

US11275998B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11275998-B2
Application numberUS-201815994930-A
CountryUS
Kind codeB2
Filing dateMay 31, 2018
Priority dateMay 31, 2018
Publication dateMar 15, 2022
Grant dateMar 15, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates generally to techniques for improving the implementation of certain operations on an integrated circuit. In particular, deep learning techniques, which may use a deep neural network (DNN) topology, may be implemented more efficiently using low-precision weights and activation values by efficiently performing down conversion of data to a lower precision and by preventing data overflow during suitable computations. Further, by more efficiently mapping multipliers to programmable logic on the integrated circuit device, the resources used by the DNN topology to perform, for example, inference tasks may be reduced, resulting in improved integrated circuit operating speeds.

First claim

Opening claim text (preview).

What is claimed is: 1. An integrated circuit device, comprising: first input circuitry configured to receive a first input; second input circuitry configured to receive a first control signal; third input circuitry configured to receive a second input; fourth input circuitry configured to receive a second control signal; first combinatorial circuitry coupled to the first input circuitry and the second input circuitry, wherein the first combinatorial circuitry is configured to receive the first input from the first input circuitry and the first control signal from the second input circuitry, and wherein the first combinatorial circuitry comprises first output circuitry and is configured to generate a first output at the first output circuitry by selectively inverting the first input based at least in part on the first control signal; second combinatorial circuitry coupled to the third input circuitry and the fourth input circuitry, wherein the second combinatorial circuitry is configured to receive the second input from the third input circuitry and the second control signal from the fourth input circuitry, and wherein the second combinatorial circuitry comprises second output circuitry and is configured to generate a second output at the second output circuitry by selectively inverting the second input based at least in part on the second control signal; arithmetic compression circuitry coupled to the second input circuitry and the fourth input circuitry and configured to generate a correction factor based at least in part on a compressed sum of the first control signal and the second control signal, wherein the arithmetic compression circuitry is configured to receive the first control signal from the second input circuitry and the second control signal from the fourth input circuitry; and adder circuitry coupled to the first output circuitry, the second output circuitry, and the arithmetic compression circuitry and configured to generate a sum of the first output, the second output, and the correction factor, wherein the adder circuitry is configured to receive the first output from the first output circuitry, the second output from the second output circuitry, and the correction factor from the arithmetic compression circuitry. 2. The integrated circuit device of claim 1 , wherein the sum is equivalent in value to an additional sum of the first input selectively negated based at least in part on the first control signal and the second input selectively negated based at least in part on the second control signal. 3. The integrated circuit device of claim 1 , wherein the first combinatorial circuitry comprises a look up table. 4. The integrated circuit device of claim 1 , wherein the first combinatorial circuitry is configured to selectively invert the first input based at least in part on an exclusive OR of the first input and the first control signal. 5. The integrated circuit device of claim 1 , wherein the arithmetic compression circuitry comprises read-only memory. 6. The integrated circuit of claim 1 , wherein the integrated circuit comprises a field-programmable gate array. 7. The integrated circuit device of claim 1 , wherein the sum comprises a dot-product. 8. The integrated circuit device of claim 1 , comprising shift circuitry configured to right-shift the sum a number of bits based at least in part on a bit-width of the sum and a configured bit-width, wherein the configured bit-width is stored on the integrated circuit. 9. The integrated circuit device of claim 8 , wherein the integrated circuit is configured to implement a deep neural network, wherein the configured bit-width is generated based at least in part on a maximum number of values generated by a first layer in the deep neural network and a minimum number of values generated by a second layer in the deep neural network. 10. The integrated circuit of claim 9 , wherein the configured bit-width is based at least in part on a maximum value generated in a subset of one or more values generated in the deep neural network, wherein a number of values included in the subset is based at least in part on the minimum number of values. 11. The integrated circuit of claim 1 , wherein the integrated circuit is configured to implement a deep neural network, wherein the first input comprises a subset of one or more bits in an 8-bit activation of the deep neural network. 12. A tangible, non-transitory, machine-readable medium, comprising machine-readable instructions that, when executed by one or more processors, cause the processors to: receive design instructions to configure programmable logic on an integrated circuit; identify, in the design instructions, an adder structure, wherein one or more adders in the adder structure are configured to perform programmable negation on one or more respective inputs; flag the one or more adders configured to perform programmable negation; replace, in the design instructions, the flagged one or more adders with programmable inversion circuitry, wherein the programmable inversion circuitry comprises: combinatorial circuitry configured to selectively invert each of the one or more inputs based at least in part on a respective one or more control signals; and arithmetic compression circuitry configured to generate a correction factor based at least in part on the one or more control signals; and route, in the design instructions, the correction factor to an unbalanced tuple in the adder structure or add an additional adder to the adder structure and route the correction factor to the additional adder. 13. The tangible, non-transitory, machine-readable medium of claim 12 , wherein the machine-readable instructions, when executed by one or more processors, cause the processors to configure the programmable logic according to the design instructions after routing the correction factor to the unbalanced tuple or to the additional adder. 14. The tangible, non-transitory, machine-readable medium of claim 12 , wherein the machine-readable instructions, when executed by one or more processors, cause the processors to: in response to receiving instructions from a designer, generate the design instructions. 15. The tangible, non-transitory, machine-readable medium of claim 14 , wherein the instructions comprise directions to perform programmable negation on an additional one or more inputs, wherein generating the design instructions comprises generating an additional adder structure, wherein the additional adder structure comprises the programmable inversion circuitry configured to perform the programmable negation on the additional one or more inputs. 16. A tangible, non-transitory, machine-readable medium, comprising machine-readable instructions that, when executed by one or more processors, cause the processors to: receive design instructions to configure programmable logic on an integrated circuit to compute a ternary dot-product; identify, in the design instructions, a first ternary signature in an adder structure configured to compute the ternary dot-product, wherein the first ternary signature is configured to produce a set of products of a first input ternary-multiplied by a first set of weights and a second input ternary multiplied by a second set of weights; replace, in the design instructions, the first ternary signature with a second ternary signature, wherein the second ternary signature is configured to produce a first subset of the set of products; configure, in the design instructions, programmable inversion circuitry to selectively generate a second subset of the set of products based

Assignees

Inventors

Classifications

  • Activation functions · CPC title

  • Combinations of networks · CPC title

  • Architecture, e.g. interconnection topology · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11275998B2 cover?
The present disclosure relates generally to techniques for improving the implementation of certain operations on an integrated circuit. In particular, deep learning techniques, which may use a deep neural network (DNN) topology, may be implemented more efficiently using low-precision weights and activation values by efficiently performing down conversion of data to a lower precision and by prev…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).