Presentation is loading. Please wait.

Presentation is loading. Please wait.

SECTIONS 1-7 By Astha Chawla

Similar presentations


Presentation on theme: "SECTIONS 1-7 By Astha Chawla"— Presentation transcript:

1 SECTIONS 1-7 By Astha Chawla
CHAPTER 4 Optimizing Capacitance and Switching Activity to Reduce Dynamic Power SECTIONS 1-7 By Astha Chawla

2 Introduction C and A are intertwined P = V2 X f x Ceffective.
ILP + Frequency increase => Power problem!! Factors affecting A: Complexity of the processor Exploitation of parallelism Bit-width of its structures etc. Optimized at the architectural and microarchitectural level Can be changed by run-time optimizations Factors affecting C: Size of a processor’s structure Organization to exploit locality Manipulated at the circuit and process technology level Determined at fixed design time

3 Excess Switching Activity.
Idle-Unit switching activity: Triggered by clock transitions in unused portions of hardware. Idle –width switching activity : Mismatch in the implemented and the actual width of processor structures. Idle-capacity switching activity : When a program does not use the provided hardware architectures in their entirety. Parallel switching activity: Activity expended in parallel for performance Cacheable switching activity: Repetitive switching activity, convert computing activity to cache lookups Speculative switching activity: Speculatively executing incorrect instructions is wasted activity Value- dependent switching activity: Power consumed depends on the actual data values.

4 Capacitance Does not change dynamically
Total capacitance = Capacitance of transistors + capacitance of wires. Burd and Brodersen: CL = CW + Cfixed Low power architectural techniques require partitioning: Wire partitioning Bit-line segmentation

5 IDLE- UNIT SWITCHING ACTIVITY.
Static logic: To eliminate switching, enough to prevent inputs from changing. Dynamic logic: Power can be consumed even if the inputs to the circuit do not change No effect on computation Clock gating

6 Guarded evaluation: aims to shuts down part of the original circuit.
Precomputation: aims to derive a precomputation circuit for a logic block multiplexed precomputation architecture. F(x=0), F(x=1) Guarded evaluation: aims to shuts down part of the original circuit.

7 Deterministic clock gating
Gating the clock to the processor structures when they are known to be idle Power savings, improves EDP, without performance loss. Clock gating examples: IBM’s Power 5 Reduction in switching power > 25% Implements fine-grain gating domain Intel’s Xscale processors Implements three power- saving modes: Idle, Standby, Sleep Cuts down power consumption by 30%

8 Idle- Width switching activity: Core
Arises from a mismatch between the designed bit-width of a processor and the actual bit-width needed in frequently occurring operations Dynamically detects narrow- width (16 bit wide or less) operands. Abundance in integer and multimedia applications Approaches: Value gating: disabling the unused width. Disabling switching in unused parts of ALU if both operands are narrow. Significant power savings Operation packing: Packs more than one narrow- width operation in the full width of hardware Improves performance without significant power overhead. Speculative operation packing. Significance compression: Compresses non-significant bits. Byte serial pipeline.

9 Idle- Width switching activity: Caches
Dynamic zero compression: accesses only significant bits Only compresses zero bytes.- zero indicator bit Frequent value compression: dictionary loaded with the frequent values of a program. Simple Most efficient compression mechanism Frequent value cache: cache line contains compressed and uncompressed words. First array: holds 8 low-order bits. Second array: holds remaining 24 high-order bits

10 Packing compressed cache lines
Space freed by compression remains empty. Increases cache utilization: indirect power savings. Packing techniques: Variable packing: packs variable number of cache lines into cache frames. expensive Fixed packing: preset number of cache lines are packed Reduced opportunities for compression Compression cache: Uses frequent value compression Does not attempt to pack cache lines into frames Frame holds either two compressed or one uncompressed line. Significance compression cache: lines are compressed using sign compression Instruction compression.

11 IDLE- CAPACITY SWITCHING ACTIVITY
Wasted activity related to out-of-order execution Processor resources over provisioned to support high instruction throughput. Power inefficiency of out-of-order processors: Energy-per-instruction growth Ei ~ (IW)γ .

12 Resource partitioning.
Cannot afford latency of very long wires. Partitioned by placing buffers Aimed at size vs speed trade-off. Wire partitioning Wire delay proportional to R x C . Breaking wire into ‘k’ segments improves delay by k2 Total energy increases exponentially with k. Replacing buffers with tristate devices.

13 IDLE- CAPACITY SWITCHING ACTIVITY: INSTRUCTION QUEUE.
Resizable IQ, mix of CAM and SRAM Readiness feedback control Adjust IQ size based on the activity of its entries. Decision making scheme has a safety mechanism. Occupancy feedback control IQ, LSQ, ROB. Occupancy of a structure is the appropriate feedback control metric. Logical resizing without partitioning IQ organized as a circular FIFO buffer. Limiting the size logically by limiting the part that can be allocated to new entries ILP- contribution feedback control Instruction queue collapsing

14 IDLE-CAPACITY SWITCHING ACTIVITY: CORE
Dynamically changing the width of an 8-issue processor to 6 or 4-issue. 6-issue processor: half of a cluster is disabled 4-issue processor: one whole cluster is disabled Appropriate functional units are clock gated. Decisions made at the end of the sampling window

15 THANK YOU!


Download ppt "SECTIONS 1-7 By Astha Chawla"

Similar presentations


Ads by Google