Reduced dot product computation circuit
US-10740434-B1 · Aug 11, 2020 · US
US11275998B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11275998-B2 |
| Application number | US-201815994930-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 31, 2018 |
| Priority date | May 31, 2018 |
| Publication date | Mar 15, 2022 |
| Grant date | Mar 15, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure relates generally to techniques for improving the implementation of certain operations on an integrated circuit. In particular, deep learning techniques, which may use a deep neural network (DNN) topology, may be implemented more efficiently using low-precision weights and activation values by efficiently performing down conversion of data to a lower precision and by preventing data overflow during suitable computations. Further, by more efficiently mapping multipliers to programmable logic on the integrated circuit device, the resources used by the DNN topology to perform, for example, inference tasks may be reduced, resulting in improved integrated circuit operating speeds.
Opening claim text (preview).
What is claimed is: 1. An integrated circuit device, comprising: first input circuitry configured to receive a first input; second input circuitry configured to receive a first control signal; third input circuitry configured to receive a second input; fourth input circuitry configured to receive a second control signal; first combinatorial circuitry coupled to the first input circuitry and the second input circuitry, wherein the first combinatorial circuitry is configured to receive the first input from the first input circuitry and the first control signal from the second input circuitry, and wherein the first combinatorial circuitry comprises first output circuitry and is configured to generate a first output at the first output circuitry by selectively inverting the first input based at least in part on the first control signal; second combinatorial circuitry coupled to the third input circuitry and the fourth input circuitry, wherein the second combinatorial circuitry is configured to receive the second input from the third input circuitry and the second control signal from the fourth input circuitry, and wherein the second combinatorial circuitry comprises second output circuitry and is configured to generate a second output at the second output circuitry by selectively inverting the second input based at least in part on the second control signal; arithmetic compression circuitry coupled to the second input circuitry and the fourth input circuitry and configured to generate a correction factor based at least in part on a compressed sum of the first control signal and the second control signal, wherein the arithmetic compression circuitry is configured to receive the first control signal from the second input circuitry and the second control signal from the fourth input circuitry; and adder circuitry coupled to the first output circuitry, the second output circuitry, and the arithmetic compression circuitry and configured to generate a sum of the first output, the second output, and the correction factor, wherein the adder circuitry is configured to receive the first output from the first output circuitry, the second output from the second output circuitry, and the correction factor from the arithmetic compression circuitry. 2. The integrated circuit device of claim 1 , wherein the sum is equivalent in value to an additional sum of the first input selectively negated based at least in part on the first control signal and the second input selectively negated based at least in part on the second control signal. 3. The integrated circuit device of claim 1 , wherein the first combinatorial circuitry comprises a look up table. 4. The integrated circuit device of claim 1 , wherein the first combinatorial circuitry is configured to selectively invert the first input based at least in part on an exclusive OR of the first input and the first control signal. 5. The integrated circuit device of claim 1 , wherein the arithmetic compression circuitry comprises read-only memory. 6. The integrated circuit of claim 1 , wherein the integrated circuit comprises a field-programmable gate array. 7. The integrated circuit device of claim 1 , wherein the sum comprises a dot-product. 8. The integrated circuit device of claim 1 , comprising shift circuitry configured to right-shift the sum a number of bits based at least in part on a bit-width of the sum and a configured bit-width, wherein the configured bit-width is stored on the integrated circuit. 9. The integrated circuit device of claim 8 , wherein the integrated circuit is configured to implement a deep neural network, wherein the configured bit-width is generated based at least in part on a maximum number of values generated by a first layer in the deep neural network and a minimum number of values generated by a second layer in the deep neural network. 10. The integrated circuit of claim 9 , wherein the configured bit-width is based at least in part on a maximum value generated in a subset of one or more values generated in the deep neural network, wherein a number of values included in the subset is based at least in part on the minimum number of values. 11. The integrated circuit of claim 1 , wherein the integrated circuit is configured to implement a deep neural network, wherein the first input comprises a subset of one or more bits in an 8-bit activation of the deep neural network. 12. A tangible, non-transitory, machine-readable medium, comprising machine-readable instructions that, when executed by one or more processors, cause the processors to: receive design instructions to configure programmable logic on an integrated circuit; identify, in the design instructions, an adder structure, wherein one or more adders in the adder structure are configured to perform programmable negation on one or more respective inputs; flag the one or more adders configured to perform programmable negation; replace, in the design instructions, the flagged one or more adders with programmable inversion circuitry, wherein the programmable inversion circuitry comprises: combinatorial circuitry configured to selectively invert each of the one or more inputs based at least in part on a respective one or more control signals; and arithmetic compression circuitry configured to generate a correction factor based at least in part on the one or more control signals; and route, in the design instructions, the correction factor to an unbalanced tuple in the adder structure or add an additional adder to the adder structure and route the correction factor to the additional adder. 13. The tangible, non-transitory, machine-readable medium of claim 12 , wherein the machine-readable instructions, when executed by one or more processors, cause the processors to configure the programmable logic according to the design instructions after routing the correction factor to the unbalanced tuple or to the additional adder. 14. The tangible, non-transitory, machine-readable medium of claim 12 , wherein the machine-readable instructions, when executed by one or more processors, cause the processors to: in response to receiving instructions from a designer, generate the design instructions. 15. The tangible, non-transitory, machine-readable medium of claim 14 , wherein the instructions comprise directions to perform programmable negation on an additional one or more inputs, wherein generating the design instructions comprises generating an additional adder structure, wherein the additional adder structure comprises the programmable inversion circuitry configured to perform the programmable negation on the additional one or more inputs. 16. A tangible, non-transitory, machine-readable medium, comprising machine-readable instructions that, when executed by one or more processors, cause the processors to: receive design instructions to configure programmable logic on an integrated circuit to compute a ternary dot-product; identify, in the design instructions, a first ternary signature in an adder structure configured to compute the ternary dot-product, wherein the first ternary signature is configured to produce a set of products of a first input ternary-multiplied by a first set of weights and a second input ternary multiplied by a second set of weights; replace, in the design instructions, the first ternary signature with a second ternary signature, wherein the second ternary signature is configured to produce a first subset of the set of products; configure, in the design instructions, programmable inversion circuitry to selectively generate a second subset of the set of products based
Activation functions · CPC title
Combinations of networks · CPC title
Architecture, e.g. interconnection topology · CPC title
using electronic means · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.