Multiply-accumulate “0” data gating
US-10853035-B2 · Dec 1, 2020 · US
US11656846B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11656846-B2 |
| Application number | US-202017103179-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 24, 2020 |
| Priority date | Apr 28, 2017 |
| Publication date | May 23, 2023 |
| Grant date | May 23, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.
Opening claim text (preview).
The invention claimed is: 1. An apparatus comprising: an instruction cache to receive graphics processing instructions; a general-purpose graphics processing compute block comprising a plurality of graphics processing cores to perform operations to execute the graphics processing instructions; and processing circuitry to: receive, into at least one of a multiply unit or an accumulate unit, a first operand and a second operand from a layer of a convolutional neural network, wherein the first operand and the second operand comprise at least one of inference weights or activations of the convolutional neural network; and responsive to at least one of the first operand or the second operand having a value of negative one, cause the second operand to be negative and bypass the multiply unit, such that the multiply unit performs no operations on the first operand or the second operand. 2. The apparatus of claim 1 , further comprising processing circuitry to: bypass the multiply unit when at least one of the first operand or the second operand is a value of a one, such that the multiply unit performs no operations on the first operand or the second operand. 3. The apparatus of claim 1 , further comprising processing circuitry to: produce no output from the multiply unit when at least one of the first operand or the second operand is a value of zero. 4. The apparatus of claim 1 , wherein the multiply unit comprises processing circuitry to: receive the first operand and a first indicator of a first number of valid bits in the first operand; receive the second operand, and a second indicator of a second number of valid bits in the second operand; and multiply only the valid bits in the first operand and valid bits in the second operand. 5. The apparatus of claim 1 , further comprising processing circuitry to: produce no output when at least one of the first operand or the second operand is within a threshold value of zero. 6. The apparatus of claim 1 , further comprising processing circuitry to: bypass the multiply unit when at least one of the first operand or the second operand is within a threshold value of a one, such that the multiply unit performs no operations on the first operand or the second operand. 7. The apparatus of claim 1 , further comprising processing circuitry to: negate the second operand and bypass the multiply unit when at least one of the first operand or the second operand is within a threshold value of a power of two; and treat at least one of the first operand or the second operand as a power of two when a value of the first operand or the second operand is within a threshold of two. 8. The apparatus of claim 1 , further comprising processing circuitry to: bypass the multiply unit and use a shift register when at least one of the first operand or the second operand is a power of two. 9. The apparatus of claim 1 , further comprising a thread scheduler comprising logic, at least partially including hardware logic, to: break an input vector into a plurality of segments; and perform a dot product using the plurality of segments. 10. The apparatus of claim 1 , wherein the processing circuitry are on a single integrated circuit. 11. A method, comprising: receiving graphics processing instructions in an instruction cache of a general purpose graphics processor; performing operations to execute the graphics processing instructions; receiving, into at least one of a multiply unit or an accumulate unit of the general purpose graphics processor, a first operand and a second operand from a layer of a convolutional neural network, wherein the first operand and the second operand comprise at least one of inference weights or activations of the convolutional neural network; and responsive to at least one of the first operand or the second operand having a value of negative one, causing the second operand to be negative and bypassing the multiply unit, such that the multiply unit performs no operations on the first operand or the second operand. 12. The method of claim 11 , further comprising: bypassing the multiply unit when at least one of the first operand or the second operand is a value of a one, such that the multiply unit performs no operations on the first operand or the second operand. 13. The method of claim 11 , further comprising processing: producing no output from the multiply unit when at least one of the first operand or the second operand is a value of zero. 14. The method of claim 11 , further comprising: receiving the first operand and a first indicator of a first number of valid bits in the first operand; receiving the second operand, and a second indicator of a second number of valid bits in the second operand; and multiplying the valid bits in the first operand and valid bits in the second operand. 15. The method of claim 11 , further comprising: producing no output when at least one of the first operand or the second operand is within a threshold value of zero. 16. The method of claim 13 , further comprising: bypassing the multiply unit when at least one of the first operand or the second operand is within a threshold value of a one, such that the multiply unit performs no operations on the first operand or the second operand. 17. The method of claim 12 , further comprising: negating the second operand and bypass the multiply unit when at least one of the first operand or the second operand is within a threshold value of a power of two. 18. The method of claim 11 , further comprising: bypassing the multiply unit and using a shift register when at least one of the first operand or the second operand is a power of two. 19. The method of claim 11 , further comprising processing circuitry to: treat at least one of the first operand or the second operand as a power of two when a value of the first operand or the second operand is within a threshold of a power of two. 20. The method of claim 12 , further comprising: breaking an input vector into a plurality of segments; and performing a dot product using the plurality of segments. 21. A system comprising: a memory; an instruction cache to receive graphics processing instructions; a general-purpose graphics processing compute block comprising a plurality of graphics processing cores to perform operations to execute the graphics processing instructions; and processing circuitry to: receive, into at least one of a multiply unit or an accumulate unit, a first operand and a second operand from a layer of a convolutional neural network, wherein the first operand and the second operand comprise at least one of inference weights or activations of the convolutional neural network; and responsive to at least one of the first operand or the second operand having a value of negative one, cause the second operand to be negative and bypass the multiply unit, such that the multiply unit performs no operations on the first operand or the second operand. 22. The system of claim 21 , wherein the processing circuitry is further to bypass the multiply unit when at least one of the first operand or the second operand is a value of a one, such that the multiply unit performs no operations on the first operand or the second operand. 23. The system of claim 21 , wherein the processing circuitry is further to produce no output from the multiply unit when at least one of the first operand or the second operand is a value of zero.
Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs (mappping at compile time, see G06F8/451) · CPC title
Machine learning · CPC title
Processor architectures; Processor configuration, e.g. pipelining · CPC title
Backpropagation, e.g. using gradient descent · CPC title
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.