Selectable and hierarchical power management
US-2024385668-A1 · Nov 21, 2024 · US
US10133577B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10133577-B2 |
| Application number | US-201213997791-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 19, 2012 |
| Priority date | Dec 19, 2012 |
| Publication date | Nov 20, 2018 |
| Grant date | Nov 20, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processor includes an instruction schedule and dispatch (schedule/dispatch) unit to receive a single instruction multiple data (SIMD) instruction to perform an operation on multiple data elements stored in a storage location indicated by a first source operand. The instruction schedule/dispatch unit is to determine a first of the data elements that will not be operated to generate a result written to a destination operand based on a second source operand. The processor further includes multiple processing elements coupled to the instruction schedule/dispatch unit to process the data elements of the SIMD instruction in a vector manner, and a power management unit coupled to the instruction schedule/dispatch unit to reduce power consumption of a first of the processing elements configured to process the first data element.
Opening claim text (preview).
What is claimed is: 1. A processor, comprising: a fetch circuit to fetch a single instruction multiple data (SIMD) instruction comprising an opcode, first source, second source, and destination operands to specify first source, second source, and destination vectors, respectively, and a mask operand including a plurality of bits each indicating whether a corresponding data element is to be operated on; an inactive element detection circuit to analyze the mask operand during each of a fetch stage, a decode stage, an allocate stage, a rename stage, and a schedule stage of a pipeline, and to generate and store an inactive element indicator for each inactive element; a zero element prediction circuit to analyze the opcode and the first and second specified sources during the schedule stage to generate and store a zero element indicator for each destination element predicted to be zero; a plurality of processing elements, including an arithmetic logic unit (ALU), a register file, a data bus dispatch path, and a write-back path in a retirement unit, to process a plurality of data elements of the SIMD instruction in a vector manner; and a power management unit, including clock gating logic to reduce a clock frequency of a clock signal to processing elements identified by one or both of the stored zero element indicator and the stored inactive element indicator. 2. The processor of claim 1 , wherein the clock gating logic is to shut off the clock signal to the processing elements identified by at least one of the stored zero element indicator and the stored inactive element indicator. 3. The processor of claim 1 , wherein the power management unit is to reduce the clock frequency to an arithmetic logic unit (ALU) of the processing elements identified by at least one of the stored zero element indicator and the stored inactive element indicator. 4. The processor of claim 1 , wherein the power management unit is to reduce the power consumption of a register file associated with the processing elements identified by at least one of the stored zero element indicator and the stored inactive element indicator. 5. The processor of claim 1 , wherein the power management unit is to reduce power consumption to a write-back path of a retirement unit associated with the processing elements identified by at least one of the stored zero element indicator and the stored inactive element indicator. 6. The processor of claim 1 , wherein the power management unit is to reduce power consumption to a data bus dispatch path to a memory location associated with the processing elements identified by at least one of the stored zero element indicator and the stored inactive element indicator. 7. The processor of claim 1 , wherein the power management unit is to reduce a clock frequency of a clock signal to a vector lane corresponding to the processing elements identified by at least one of the stored zero element indicator and the stored inactive element indicator, the vector lane including an ALU, a register file, a data bus dispatch path, and a writeback path in a retirement unit. 8. The processor of claim 1 , further comprising a pipeline including a fetch stage, a decode stage, an allocate stage, a register renaming stage, and a schedule stage, and wherein the inactive element detection circuit is to analyze the mask operand and generate and store the inactive element indicator during all of the fetch, decode, allocate, register renaming, and schedule stages. 9. The processor of claim 8 , wherein the zero element detection circuit is to analyze the opcode and the data elements identified by the indicated first and second source operands during the schedule stage in order to identify the zero result data element. 10. The processor of claim 1 , wherein the instruction is further to specify whether to apply zeroing or merging in response to the mask. 11. A method, comprising: fetching, by a fetch circuit, a single instruction multiple data (SIMD) instruction comprising an opcode, first source, second source, and destination operands to specify first source, second source, and destination vectors, respectively, and a mask operand including a plurality of bits each indicating whether a corresponding data element is to be operated on; analyzing, by a zero element prediction circuit during a schedule stage of a pipeline, the opcode and the first and second indicated sources to generate and store a zero element indicator for each indicated destination element predicted to be zero; analyzing the mask operand by an inactive element detection circuit during each of a fetch stage, a decode stage, an allocate stage, a rename stage, and a schedule stage of a pipeline, and to generate and store an inactive element indicator for each inactive element; processing, by a plurality of processing elements, including an arithmetic logic unit (ALU), a register file, a data bus dispatch path, and a write-back path in a retirement unit, a plurality of data elements of the SIMD instruction in a vector manner; and reducing, by a clock gating circuit of a power management unit, a clock frequency of a clock signal to the processing elements identified by one or both of the stored zero element indicator and the stored inactive element indicator. 12. The method of claim 11 , wherein the clock gating logic is to shut off the clock signal to the processing elements identified by one or both of the stored zero element indicator and the stored inactive element indicator. 13. The method of claim 11 , wherein the power management unit is to reduce the clock frequency to an arithmetic logic unit (ALU) of the at processing elements identified by one or both of the stored zero element indicator and the stored inactive element indicator. 14. The method of claim 11 , wherein the power management unit is to reduce the power consumption of a register file associated with the processing elements identified by one or both of the stored zero element indicator and the stored inactive element indicator. 15. The method of claim 11 , wherein the power management unit is to reduce power consumption to a write-back path of a retirement unit associated with the processing elements identified by one or both of the stored zero element indicator and the stored inactive element indicator. 16. The method of claim 11 , wherein the power management unit is to reduce a clock frequency of a clock signal to a vector lane corresponding to the processing elements identified by one or both of the stored zero element indicator and the stored inactive element indicator, the vector lane including an ALU, a register file, a data bus dispatch path, and a writeback path in a retirement unit. 17. A system comprising: an interconnect; a dynamic random access memory (DRAM) coupled to the interconnect; a processor coupled to the interconnect, the processor comprising: a fetch circuit to fetch a single instruction multiple data (SIMD) instruction comprising an opcode, first source, second source, and destination operands to specify first source, second source, and destination vectors, respectively, and a mask operand including a plurality of bits each indicating whether a corresponding data element is to be operated on; an inactive element detection circuit to store, based on analyzing the mask operand during any of a fetch stage, a decode stage, an allocate stage, a rename stage, and a schedule stage of a pipeline, an inactive element indicator for each inactive element; a zero element prediction circuit to store, based on analyzing the opcode and the first and second specified s
Cross-Sectional Technologies · mapped topic
Power saving in microcontroller unit · CPC title
by task scheduling · CPC title
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.