Reducing power consumption in a fused multiply-add (FMA) unit of a processor

US9778911B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9778911-B2
Application numberUS-201615144926-A
CountryUS
Kind codeB2
Filing dateMay 3, 2016
Priority dateNov 21, 2011
Publication dateOct 3, 2017
Grant dateOct 3, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, the present invention includes a processor having a fused multiply-add (FMA) unit to perform FMA instructions and add-like instructions. This unit can include an adder with multiple segments each independently controlled by a logic. The logic can clock gate at least one segment during execution of an add-like instruction in another segment of the adder when the add-like instruction has a width less than a width of the FMA unit. Other embodiments are described and claimed.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: a plurality of execution units to execute instructions, the plurality of execution units including at least one arithmetic logic unit and a fused multiply-add (FMA) unit to perform FMA instructions, the FMA unit including a multiplier and an adder coupled to an output of the multiplier, the adder having a plurality of segments independently controllable to be powered on or off; a tracker coupled to the adder to cause all of the plurality of segments of the adder to be powered on during execution of a first instruction in the FMA unit following a FMA instruction to clear potential dirty data, regardless of whether the first instruction is to use the all segments of the adder, and thereafter to cause a segment of the adder to be powered on if the segment is to be used during execution of an instruction; and a power controller to perform power management. 2. The processor of claim 1 , wherein the tracker includes a plurality of tracker segments each associated with one of the plurality of segments of the adder. 3. The processor of claim 2 , wherein a first tracker segment is to enable a first segment of the adder to perform a first add-like instruction and a second tracker segment is to enable a second segment of the adder to perform the first add-like instruction concurrently. 4. The processor of claim 3 , wherein the first add-like instruction comprises a subtraction instruction. 5. The processor of claim 3 , wherein the first add-like instruction comprises a conversion instruction. 6. The processor of claim 3 , wherein a width of the first and second segments of the adder is at least equal to a width of the first add-like instruction. 7. The processor of claim 1 , wherein the plurality of segments are to be independently clock gated. 8. The processor of claim 7 , wherein the tracker is to clock gate a first segment of the adder during an instruction that is not to be executed in the first segment of the adder. 9. The processor of claim 1 , wherein the processor is to set a first flag responsive to the FMA instruction and reset the flag responsive to receipt of a non-FMA instruction. 10. The processor of claim 9 , wherein the processor is to reset a second flag responsive to execution of the non-FMA instruction. 11. The processor of claim 1 , wherein the FMA unit is of N-bit width, and the adder is formed of four segments, at least two of the segments each having a bit width greater than N/4 and at least one of the segments having a bit width less than N/4, wherein the two segments having the bit width greater than N/4 are to execute a dual precision add-like instruction while the other two segments are to be powered off. 12. The processor of claim 1 , wherein a first segment of the adder is to execute a first single precision add-like instruction and a second segment of the adder is to execute a second single precision add-like instruction concurrently, and a third segment of the adder and a fourth segment of the adder are to be clock gated. 13. The processor of claim 1 , further comprising a rounder coupled to an output of the adder, wherein the rounder is to be clock gated during execution of the FMA instruction in one or more of the multiplier and the adder. 14. A non-transitory machine-readable medium having stored thereon instructions, which if performed by a machine cause the machine to perform a method comprising: powering a first segment of an adder of a fused multiply-add (FMA) unit of a processor during execution of a first instruction in the FMA unit after execution of a FMA instruction in the FMA unit although the first instruction is not to use the first segment of the adder; and powering off the first segment of the adder during execution of a next instruction following the first instruction if the next instruction is not to use the first segment of the adder. 15. The non-transitory machine-readable medium of claim 14 , wherein the method further comprises powering off the first segment of the adder during the next instruction execution while a second segment of the adder is powered on, wherein the next instruction is to use the second segment of the adder. 16. The non-transitory machine-readable medium of claim 14 , wherein the method further comprises powering the first segment of the adder and a third segment of the adder during concurrent execution of a first add-like instruction and a second add-like instruction, wherein at least a second segment of the adder is powered off during the concurrent execution. 17. The non-transitory machine-readable medium of claim 14 , wherein the method further comprises: receiving the first instruction in a tracker associated with the first segment of the adder; and generating an enable signal to enable a clock signal to be provided to the first segment during execution of the first instruction. 18. A system comprising: a processor having a fused multiply-add (FMA) unit to perform FMA instructions and add instructions, wherein an adder of the FMA unit includes a plurality of segments to be independently controlled, and a logic to clock gate at least one segment of the adder during execution of an add instruction in another segment of the adder, the add instruction having a width less than a width of the FMA unit, after the at least one segment was powered on but not used during execution of at least one add instruction following a FMA instruction; and a dynamic random access memory (DRAM) coupled to the processor. 19. The system of claim 18 , wherein the adder includes four segments, two of the segments having a bit width greater than N/4 and at least one other segment having a bit width less than N/4, the two segments having the bit width greater than N/4 to execute a dual precision add instruction while the other two segments are to be powered off. 20. The system of claim 18 , wherein the adder comprises N bits and power consumption in the adder for execution of an add instruction of N/2 bits is no greater than power consumption of an adder having N/2 bits for execution of an add instruction of N/2 bits.

Assignees

Inventors

Classifications

  • G06F7/483Primary

    Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers {(G06F7/4806, G06F7/4824, G06F7/49, G06F7/491, G06F7/544 take precedence)} · CPC title

  • with variable precision · CPC title

  • Adding; Subtracting {(G06F7/4833, G06F7/4836 take precedence)} · CPC title

  • Cross-Sectional Technologies · mapped topic

  • Power saving characterised by the action undertaken · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9778911B2 cover?
In one embodiment, the present invention includes a processor having a fused multiply-add (FMA) unit to perform FMA instructions and add-like instructions. This unit can include an adder with multiple segments each independently controlled by a logic. The logic can clock gate at least one segment during execution of an add-like instruction in another segment of the adder when the add-like instr…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F7/483. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 03 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).