Outer product-based matrix-vector multiplication operation apparatus for accelerating vector operation and method using the same
US-2024362297-A1 · Oct 31, 2024 · US
US9360920B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9360920-B2 |
| Application number | US-201113993370-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 21, 2011 |
| Priority date | Nov 21, 2011 |
| Publication date | Jun 7, 2016 |
| Grant date | Jun 7, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In one embodiment, the present invention includes a processor having a fused multiply-add (FMA) unit to perform FMA instructions and add-like instructions. This unit can include an adder with multiple segments each independently controlled by a logic. The logic can clock gate at least one segment during execution of an add-like instruction in another segment of the adder when the add-like instruction has a width less than a width of the FMA unit. Other embodiments are described and claimed.
Opening claim text (preview).
What is claimed is: 1. A processor comprising: a core including a plurality of execution units to execute instructions, the plurality of execution units including a fused multiply-add (FMA) unit to perform FMA instructions and add-like instructions, the FMA unit including a multiplier and an adder coupled to an output of the multiplier, the adder of a width having a plurality of segments each independently controllable to be powered on or off, and a tracker coupled to the adder to cause all segments of the adder to be powered on during execution of a first instruction in the FMA unit following a FMA instruction, wherein the first instruction is not to use the all segments of the adder, and otherwise to cause a corresponding segment to be powered on only if the segment is to be used during execution of an instruction. 2. The processor of claim 1 , wherein the tracker includes a plurality of tracker segments each associated with one of the plurality of adder segments. 3. The processor of claim 2 , wherein a first tracker segment is to enable a first adder segment to perform a first add-like instruction and a second tracker segment is to enable a second adder segment to perform the first add-like instruction concurrently. 4. The processor of claim 3 , wherein a width of the first and second adder segments is at least equal to a width of the first add-like instruction. 5. The processor of claim 1 , wherein the FMA unit is of N-bit width, and the adder is formed of four segments, at least two of the segments each having a bit width greater than N/4 and at least one of the segments having a bit width less than N/4. 6. The processor of claim 5 , wherein the two segments having the bit width greater than N/4 are to execute a dual precision add-like instruction, and the other two segments are to be powered off. 7. The processor of claim 1 , wherein a first adder segment is to execute a first single precision add-like instruction and a second adder segment is to execute a second single precision add-like instruction concurrently, and a third adder segment and a fourth adder segment are to be clock gated. 8. The processor of claim 1 , wherein the tracker is to cause the all segments to be powered on during the first instruction execution to clear the all segments. 9. A method comprising: powering a first segment of an adder of a fused multiply-add (FMA) unit of a processor during execution of a first instruction in the FMA unit after execution of a FMA instruction in the FMA unit although the first instruction is not to use the first segment of the adder; and powering off the first segment of the adder during execution of a next instruction following the first instruction if the next instruction is not to use the first segment of the adder. 10. The method of claim 9 , further comprising powering off the first segment of the adder during the next instruction execution while a second segment of the adder is powered on, wherein the next instruction is to use the second segment of the adder. 11. The method of claim 9 , wherein the first instruction and the next instruction comprise add-like instructions. 12. The method of claim 9 , further comprising powering the first segment of the adder and a third segment of the adder during concurrent execution of a first add-like instruction and a second add-like instruction in the FMA unit, wherein at least a second segment of the adder is powered off during the concurrent execution. 13. The method of claim 9 , further comprising: receiving the first instruction in a tracker associated with the first segment of the adder; and generating an enable signal to enable a clock signal to be provided to the first segment during execution of the first instruction, wherein the first instruction does not use the first segment. 14. The method of claim 13 , further comprising: receiving the next instruction in the tracker; and not generating the enable signal to prevent the clock signal from being provided to the first segment during execution of the next instruction. 15. A system comprising: a processor including a fused multiply-add (FMA) unit to perform FMA instructions and add-like instructions, wherein an adder of the FMA unit has a width including a plurality of segments each independently controlled by a logic, wherein the logic is to clock gate at least one segment of the adder during execution of an add-like instruction in another segment of the adder, the add-like instruction having a width less than a width of the FMA unit, after the at least one segment was powered on during execution of at least one add-like instruction following a FMA instruction although the first add-like instruction did not use the at least one segment. 16. The system of claim 15 , wherein the adder includes four segments, two of the segments each having a bit width greater than N/4 and at least one other segment has a bit width less than N/4, the two segments having the bit width greater than N/4 to execute a dual precision add-like instruction while the other two segments are to be powered off. 17. The system of claim 16 , wherein after execution of the dual precision add-like instruction each of the segments is to be powered on only if an instruction is to use the corresponding segment, until execution of a next FMA instruction. 18. The system of claim 15 , wherein the adder comprises N bits and power consumption in the adder for execution of an add-like instruction of N/2 bits is no greater than power consumption of an adder having N/2 bits for execution of an add-like instruction of N/2 bits. 19. The system of claim 15 , wherein the logic is to power the at least one segment during the add-like instruction execution to clear the at least one segment.
Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers {(G06F7/4806, G06F7/4824, G06F7/49, G06F7/491, G06F7/544 take precedence)} · CPC title
Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title
for multiple operands, e.g. digital integrators · CPC title
Power saving in microcontroller unit · CPC title
Monitoring of events, devices or parameters that trigger a change in power modality · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.