Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors.

Slides:



Advertisements
Similar presentations
Computer Abstractions and Technology
Advertisements

9/15/05ELEC / Lecture 71 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
Fall 06, Sep 19, 21 ELEC / Lecture 6 1 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic.
Dynamic Scan Clock Control In BIST Circuits Priyadharshini Shanmugasundaram Vishwani D. Agrawal
Dual Voltage Design for Minimum Energy Using Gate Slack Kyungseok Kim and Vishwani D. Agrawal ECE Dept. Auburn University Auburn, AL 36849, USA IEEE ICIT-SSST.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
10/27/05ELEC / Lecture 161 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
11/01/05ELEC / Lecture 171 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani.
8/19/04ELEC / ELEC / Advanced Topics in Electrical Engineering Designing VLSI for Low-Power and Self-Test Fall 2004 Vishwani.
Priyadharshini Shanmugasundaram Vishwani D. Agrawal DYNAMIC SCAN CLOCK CONTROL FOR TEST TIME REDUCTION MAINTAINING.
9/20/05ELEC / Lecture 81 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
9/13/05ELEC / Lecture 61 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
8/18/05ELEC / Lecture 11 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
Fall 2006, Nov. 28 ELEC / Lecture 11 1 ELEC / (Fall 2006) Low-Power Design of Electronic Circuits Power Analysis: High-Level.
10/13/05ELEC / Lecture 131 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 14 1 ELEC 5270/6270 Fall 2007 Low-Power Design of Electronic Circuits Power Aware Microprocessors Vishwani.
2/8/06D&T Seminar1 Multi-Core Parallelism for Low- Power Design Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering.
Fall 2006, Sep. 26, Oct. 3 ELEC / Lecture 7 1 ELEC / (Fall 2006) Low-Power Design of Electronic Circuits Dynamic Power:
Fall 06, Sep 14 ELEC / Lecture 5 1 ELEC / (Fall 2006) Low-Power Design of Electronic Circuits (Formerly ELEC / )
Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors Vishwani D. Agrawal.
Architectural Power Management for High Leakage Technologies Department of Electrical and Computer Engineering Auburn University, Auburn, AL /15/2011.
Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 6 1 ELEC 5270/6270 Fall 2007 Low-Power Design of Electronic Circuits Dynamic Power: Device Sizing Vishwani.
Fall 2006: Dec. 5 ELEC / Lecture 13 1 ELEC / (Fall 2006) Low-Power Design of Electronic Circuits Adiabatic Logic Vishwani.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 11 1 ELEC 5270/6270 Fall 2007 Low-Power Design of Electronic Circuits Adiabatic Logic Vishwani D. Agrawal.
Spring 07, Feb 15 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Power Dissipation in VLSI Chips Vishwani D. Agrawal.
Low Power Design of Integrated Systems Assoc. Prof. Dimitrios Soudris
6.893: Advanced VLSI Computer Architecture, September 28, 2000, Lecture 4, Slide 1. © Krste Asanovic Krste Asanovic
Power-Aware SoC Test Optimization through Dynamic Voltage and Frequency Scaling Vijay Sheshadri, Vishwani D. Agrawal, Prathima Agrawal Dept. of Electrical.
Copyright Agrawal, 2007ELEC5270/6270 Spring 13, Lecture 81 ELEC 5270/6270 Spring 2013 Low-Power Design of Electronic Circuits Power Aware Microprocessors.
Low Power Techniques in Processor Design
Computer Performance Computer Engineering Department.
Fall 2014, Nov ELEC / Lecture 12 1 ELEC / Computer Architecture and Design Fall 2014 Instruction-Level Parallelism.
An Efficient Algorithm for Dual-Voltage Design Without Need for Level-Conversion SSST 2012 Mridula Allani Intel Corporation, Austin, TX (Formerly.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
[Tim Shattuck, 2006][1] Performance / Watt: The New Server Focus Improving Performance / Watt For Modern Processors Tim Shattuck April 19, 2006 From the.
Using Cycle Efficiency as a System Designer Metric to Characterize an Embedded DSP and Compare Hard Core vs. Soft Core Advisor Dr. Vishwani D. Agrawal.
Spring 2010, Mar 10ELEC 7770: Advanced VLSI Design (Agrawal)1 ELEC 7770 Advanced VLSI Design Spring 2010 Gate Sizing Vishwani D. Agrawal James J. Danaher.
Basics of Energy & Power Dissipation
Microprocessor Microarchitecture Introduction Lynn Choi School of Electrical Engineering.
Copyright Agrawal, 2007ELEC6270 Spring 09, Lecture 71 ELEC 5270/6270 Spring 2009 Low-Power Design of Electronic Circuits Power Analysis: High-Level Vishwani.
Copyright Agrawal, 2007ELEC6270 Spring 13, Lecture 101 ELEC 5270/6270 Spring 2013 Low-Power Design of Electronic Circuits Adiabatic Logic Vishwani D. Agrawal.
Chapter 1 — Computer Abstractions and Technology — 1 Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency.
11/15/05ELEC / Lecture 191 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
ELEC Digital Logic Circuits Fall 2015 Delay and Power Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering.
CS203 – Advanced Computer Architecture
Chapter 1 Performance & Technology Trends. Outline What is computer architecture? Performance What is performance: latency (response time), throughput.
LOW POWER DESIGN METHODS
ELEC 5270/6270 Spring 2015 Low-Power Design of Electronic Circuits Power Aware Microprocessors Copyright Agrawal, 2007ELEC5270/6270 Spr 15, Lecture 81.
Computer Architecture & Operations I
Measuring Performance II and Logic Design
Microprocessor Microarchitecture Introduction
CS203 – Advanced Computer Architecture
Lecture 2: Performance Today’s topics:
Computer Architecture & Operations I
CS161 – Design and Architecture of Computer Systems
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
LOW POWER DESIGN METHODS V.ANANDI ASST.PROF,E&C MSRIT,BANGALORE.
Uniprocessor Performance
Vishwani D. Agrawal James J. Danaher Professor
Morgan Kaufmann Publishers
COSC 3406: Computer Organization
Vishwani D. Agrawal James J. Danaher Professor
Vishwani D. Agrawal James J. Danaher Professor
CSV881: Low-Power Design Multicore Design for Low Power
Vishwani D. Agrawal James J. Danaher Professor
Lecture 3: MIPS Instruction Set
Vishwani D. Agrawal James J. Danaher Professor
Presentation transcript:

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 141 ELEC 5270/6270 Spring 2011 Low-Power Design of Electronic Circuits Power Aware Microprocessors Vishwani D. Agrawal James J. Danaher Professor Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 142 SIA Roadmap for Processors (1999) Year Feature size (nm) Logic transistors/cm 2 6.2M18M39M84M180M390M Clock (GHz) Chip size (mm 2 ) Power supply (V) High-perf. Power (W) Source: Untrue predictions.

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 143 Power Reduction in Processors Hardware methods: Hardware methods: Voltage reduction for dynamic power Voltage reduction for dynamic power Dual-threshold devices for leakage reduction Dual-threshold devices for leakage reduction Clock gating, frequency reduction Clock gating, frequency reduction Sleep mode Sleep mode Architecture: Architecture: Instruction set Instruction set hardware organization hardware organization Software methods Software methods

Performance Criteria Throughput – computations per unit time. Throughput – computations per unit time. Performance is inverse of time – increasing CPU time indicates lower performance. Performance is inverse of time – increasing CPU time indicates lower performance. Power – computations per watt. Power – computations per watt. Energy efficiency – performance/joule. Energy efficiency – performance/joule. Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 144

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 145 SPEC CPU2006 Benchmarks Standard Performance Evaluation Corporation (SPEC) Standard Performance Evaluation Corporation (SPEC) Twelve integer and 17 floating point programs, CINT2006 and CFP2006. Twelve integer and 17 floating point programs, CINT2006 and CFP2006. Each program run time is normalized to obtain a SPEC ratio with respect to the run time of Sun Ultra Enterprise 2 system with a 296 MHz UltraSPARC II processor. Each program run time is normalized to obtain a SPEC ratio with respect to the run time of Sun Ultra Enterprise 2 system with a 296 MHz UltraSPARC II processor. It takes about 12 days to run all benchmarks on reference system. It takes about 12 days to run all benchmarks on reference system. CINT2006 and CFP2006 metrics are the geometric means of SPEC ratios: CINT2006 and CFP2006 metrics are the geometric means of SPEC ratios: Peak metric – each program is individually optimized (aggressive compilation). Peak metric – each program is individually optimized (aggressive compilation). Base metric – common optimization for all programs. Base metric – common optimization for all programs.

SPEC CINT2006 Results Dell Inc., PowerEdge R610 Dell Inc., PowerEdge R610 CPU: Intel Xeon X5670, 2.93 GHz CPU: Intel Xeon X5670, 2.93 GHz Number of chips 2, cores 12, threads/core 2 Number of chips 2, cores 12, threads/core 2 Performance metric 36.6 base, 39.4 peak Performance metric 36.6 base, 39.4 peak Dell Inc. PowerEdge M905 Dell Inc. PowerEdge M905 CPU: AMD Opteron 8381 HE, 2.50 GHz CPU: AMD Opteron 8381 HE, 2.50 GHz Number of chips 4, cores 16, threads/core 1 Number of chips 4, cores 16, threads/core 1 Performance metric 15.8 base, 19.1 peak Performance metric 15.8 base, 19.1 peak Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 146

SPEC CFP2006 Results Dell Inc., PowerEdge R610 Dell Inc., PowerEdge R610 CPU: Intel Xeon X5670, 2.93 GHz CPU: Intel Xeon X5670, 2.93 GHz Number of chips 2, cores 12, threads/core 2 Number of chips 2, cores 12, threads/core 2 Performance metric 42.5 base, 45.8 peak Performance metric 42.5 base, 45.8 peak Dell Inc. PowerEdge M905 Dell Inc. PowerEdge M905 CPU: AMD Opteron 8381 HE, 2.50 GHz CPU: AMD Opteron 8381 HE, 2.50 GHz Number of chips 4, cores 16, threads/core 1 Number of chips 4, cores 16, threads/core 1 Performance metric 17.4 base, 21.5 peak Performance metric 17.4 base, 21.5 peak Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 147

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 148 Other Benchmarks LINPACK is numerically intensive floating point linear system (Ax = b) program used for benchmarking supercomputers. LINPACK is numerically intensive floating point linear system (Ax = b) program used for benchmarking supercomputers. SPECPOWER_ssj2008 measures power and performance of a computer system. SPECPOWER_ssj2008 measures power and performance of a computer system. The initial benchmark addresses the performance of server-side Java; additional workloads are planned. The initial benchmark addresses the performance of server-side Java; additional workloads are planned

Second Quarter 2010 SPECpower_ssj2008 Results Apr 7, 2010: Hewlett-Packard ProLiant DL385 G7 Apr 7, 2010: Hewlett-Packard ProLiant DL385 G7 CPU: AMD Opteron 6174, 2.2GHz CPU: AMD Opteron 6174, 2.2GHz Number of chips 2, cores 12, threads/core 2 Number of chips 2, cores 12, threads/core 2 Total memory 16GB Total memory 16GB ssj 100% 888,819 ssj 100% 888,819 Average 100% 271 W Average 100% 271 W Average active idle 101 W Average active idle 101 W Overall ssj operations per watt 2,355 Overall ssj operations per watt 2,355 Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 149

Second Quarter 2010 SPECpower_ssj2008 Results May 19, 2010: Dell Inc., PowerEdge R610 May 19, 2010: Dell Inc., PowerEdge R610 CPU: Intel Xeon X5670, 2.93 GHz CPU: Intel Xeon X5670, 2.93 GHz Number of chips 2, cores 12, threads 2 Number of chips 2, cores 12, threads 2 Total memory 12GB Total memory 12GB ssj 100% 914,076 ssj 100% 914,076 Average 100% 244 W Average 100% 244 W Average active idle 62.3 W Average active idle 62.3 W Overall ssj operations per watt 2,938 Overall ssj operations per watt 2,938 Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1410

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1411 Energy SPEC Benchmarks Energy efficiency mode: Besides the execution time, energy efficiency of SPEC benchmark programs is also measured. Energy efficiency of a benchmark program is given by: Energy efficiency mode: Besides the execution time, energy efficiency of SPEC benchmark programs is also measured. Energy efficiency of a benchmark program is given by: 1/(Execution time) Energy efficiency = ──────────── joules consumed joules consumed D. A. Patterson and J. L. Hennessy, Computer Organization & Design: The Hardware/Software Interface, 4 th Edition, Morgan Kaufmann Publishers (Elsevier), 2009,

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1412 Energy Efficiency Efficiency averaged on n benchmark programs: Efficiency averaged on n benchmark programs: n n Efficiency= ( Π Efficiency i ) 1/n i=1 i=1 where Efficiency i is the efficiency for program i. Relative efficiency: Relative efficiency: Efficiency of a computer Efficiency of a computer Relative efficiency = ───────────────── Eff. of reference computer Eff. of reference computer

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1413 SPEC2000 Relative Energy Efficiency Always max. clock Laptop adaptive clk. Min. power min. clock

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1414 Voltage Scaling Dynamic: Reduce voltage and frequency during idle or low activity periods. Dynamic: Reduce voltage and frequency during idle or low activity periods. Static: Clustered voltage scaling Static: Clustered voltage scaling Logic on non-critical paths given lower voltage. Logic on non-critical paths given lower voltage. 47% power reduction with 10% area increase reported. 47% power reduction with 10% area increase reported. M. Igarashi et al., “Clustered Voltage Scaling Techniques for Low-Power Design,” Proc. IEEE Symp. Low Power Design, M. Igarashi et al., “Clustered Voltage Scaling Techniques for Low-Power Design,” Proc. IEEE Symp. Low Power Design, 1997.

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1415 Processor Utilization Throughput = Operations / second Throughput Time Compute-intensive processes System idle Low throughput (background) processes Maximum throughput

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1416 Examples of Processes Compute-intensive: spreadsheet, spelling check, video decoding, scientific computing. Compute-intensive: spreadsheet, spelling check, video decoding, scientific computing. Low throughput: data entry, screen updates, low bandwidth I/O data transfer. Low throughput: data entry, screen updates, low bandwidth I/O data transfer. Idle: no computation, no expected output. Idle: no computation, no expected output.

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1417 Effects of Voltage Reduction Voltage reduction increases delay, decreases throughput: Voltage reduction increases delay, decreases throughput: Slow reduction in throughput at first Slow reduction in throughput at first Rapid reduction in throughput for V ≤ V Rapid reduction in throughput for V DD ≤ V th Time per operation (TPO) increases Time per operation (TPO) increases Voltage reduction continues to reduce power consumption: Voltage reduction continues to reduce power consumption: Energy per operation (EPO) = Power × TPO Energy per operation (EPO) = Power × TPO

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1418 Energy per Operation (EPO) V / V V DD / V th Power TPO EPO

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1419 Dynamic Voltage and Clock Throughput Time spent in: Battery life Fast mode Slow mode Idle mode Always full speed 10%0%90% 1 hr Sometimes full speed 1%90%9% 5.3 hrs Rarely full speed 0.1%99%0.9% 9.2 hrs T. D. Burd and R. W. Brodersen, Energy Efficient Microprocessors, Springer, 2002, pp

Example: Find Minimum Energy Mode Processor data (rated operation): Processor data (rated operation): 2 GHz clock 2 GHz clock 1.5 volt supply voltage 1.5 volt supply voltage 0.5 volt threshold voltage 0.5 volt threshold voltage Power consumption Power consumption 50 watts dynamic power 50 watts dynamic power 50 watts static power 50 watts static power Maximum clock frequency for V volt supply Maximum clock frequency for V volt supply fα(V – V TH )/V Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1420

Example Cont. Dynamic power: Dynamic power: P d = CV 2 f = C(1.5) 2 × 2 × 10 9 = 50W C = nF, capacitance switching/cycle P d = V 2 f Dynamic energy per cycle: Dynamic energy per cycle: E d = P d /f = V 2 Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1421

Example Cont. Clock frequency: Clock frequency: f = k (V – V TH )/V = k (1.5 – 0.5)/1.5 = 2 GHz k = 3 GHz, a proportionality constant f = 3(V – 0.5)/VGHz Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1422

Example Cont. Static power: Static power: P s = k’ V 2 = k’ (1.5) 2 = 50W k’ = mho, total leakage conductance P s = V 2 Static energy per cycle: Static energy per cycle: E s = P s /f = V 3 /[3(V – 0.5)] = 7.41 V 3 /(V – 0.5) Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1423

Example Cont. Total energy per cycle: Total energy per cycle: E = E d + E s = V V 3 /(V – 0.5) To minimize E, ∂E/∂V = 0, or To minimize E, ∂E/∂V = 0, or 5V 2 – 4.6V = 0 Solutions of quadratic equation: Solutions of quadratic equation: V = volt, volt Discard second solution, which is lower than the threshold voltage of 0.5 volt. Discard second solution, which is lower than the threshold voltage of 0.5 volt. Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1424

Example: Result Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1425 Rated mode Low energy mode Reduction (%) Voltage1.5 V0.679 V54.7% Clock frequency2 GHz791 MHz60% Dynamic energy/cycle25.00 nJ5.12 nJ79.52% Static energy/cycle25.00 nJ12.96 nJ48.16% Total energy/cycle50.0 nJ18.08 nJ63.84% Dynamic power50.0 W4.05 W91.90% Static power50.0 W10.25 W79.50% Total power100.0 W14.20 W85.80%

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1426 Problem of Process Variation in Nanometer Technologies Lower V th V th Higher V th Number of chips Power specification Clock specification From a presentation: Power Reduction using LongRun2 in Transmeta’s Efficon Processor, by D. Ditzel May 17, 2006 Yield loss due to high leakage Yield loss due to slow speed Higher voltage operation Lower voltage operation Nominal voltage

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1427 Pipeline Gating A pipeline processor uses speculative execution. A pipeline processor uses speculative execution. Incorrect branch prediction results in pipeline stalls and wasted energy. Incorrect branch prediction results in pipeline stalls and wasted energy. Idea: Stop fetching instructions if a branch hazard is expected: Idea: Stop fetching instructions if a branch hazard is expected: If the count (M) of incorrect predictions exceeds a pre- specified number (N), then suspend fetching instruction for some k cycles. If the count (M) of incorrect predictions exceeds a pre- specified number (N), then suspend fetching instruction for some k cycles. Ref.: S. Manne, A. Klauser and D. Grunwald, “Pipeline Gating: Speculation Control for Energy Reduction,” Proc. 25 th Annual International Symp. Computer Architecture, June Ref.: S. Manne, A. Klauser and D. Grunwald, “Pipeline Gating: Speculation Control for Energy Reduction,” Proc. 25 th Annual International Symp. Computer Architecture, June 1998.

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1428 Slack Scheduling Application: Superscalar, out-of-order execution: Application: Superscalar, out-of-order execution: An instruction is executed as soon as the required data and resources become available. An instruction is executed as soon as the required data and resources become available. A commit unit reorders the results. A commit unit reorders the results. Delay the completion of instructions whose result is not immediately needed. Delay the completion of instructions whose result is not immediately needed. Example of RISC instructions: Example of RISC instructions: addr0, r1, r2;(A) addr0, r1, r2;(A) sub r3, r4, r5;(B) sub r3, r4, r5;(B) and r9, r1, r9;(C) and r9, r1, r9;(C) or r5, r9, r10;(D) or r5, r9, r10;(D) xor r2, r10, r11;(E) xor r2, r10, r11;(E) J. Casmira and D. Grunwald, “Dynamic Instruction Scheduling Slack,” Proc. ACM Kool Chips Workshop, Dec

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1429 Slack Scheduling Example Slack scheduling A BC D E Standard scheduling ABC D E

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1430 Slack Scheduling Slack bit Low-power execution units Re-order buffer Scheduling logic

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1431 Clock Distribution H-Tree clock Fanout, λ = 4 Tree depth, s = log λ N No. of flip-flops = N

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1432 Clock Power P clk = C L V DD 2 f + C L V DD 2 f / λ + C L V DD 2 f / λ stages – 1 1 = C L V DD 2 f Σ─ n = 0λ n where C L =total load capacitance of N flip-flops λ =constant fanout at each stage in distribution network Clock consumes about 40% of total processor power, because (1)Clock is always active (2)Makes two transitions per cycle, (α = 2) (3)Clock gating is useful; inhibit clock to unused blocks

Properties of H-Tree Balanced clock skew. Balanced clock skew. Small delay and power consumption. Small delay and power consumption. Requires fine-tuning for complex layout. Requires fine-tuning for complex layout. Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1433

Clock Power and Delay Unit size buffer or inverter delay = d Unit size buffer or inverter delay = d Total dynamic power supplied to N flip- flops, P = C L V DD 2 f Total dynamic power supplied to N flip- flops, P = C L V DD 2 f Total power consumption of clock network: Total power consumption of clock network: Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1434 Flip-flps, NClock power per flip-flopClock delay 1Pd 4P4d P8d P12d P16d

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1435 Clock Network Examples Alpha Alpha Alpha Technology 0.75μ CMOS 0.5μ CMOS 0.35μ CMOS Frequency (MHz) Total capacitance 12.5nF Clock gating used. Total power W Clock load 3.25nF3.75nF Clock power 40% 40% (20W) Max. clock skew 200ps (<10%) 90ps D. W. Bailey and B. J. Benschneider, “Clocking Design and Analysis for a 600-MHz Alpha Microprocessor,” IEEE J. Solid-State Circuits, vol. 33, no. 11, pp , Nov

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1436 Power Reduction Example Alpha 21064: 3.45V, power dissipation = Alpha 21064: 3.45V, power dissipation = 26W Reduce voltage to 1.5V, power (5.3x) = Reduce voltage to 1.5V, power (5.3x) = 4.9W Eliminate FP, power (3x) = Eliminate FP, power (3x) = 1.6W Scale 0.75μ → 0.35μ, power (2x) = Scale 0.75μ → 0.35μ, power (2x) = 0.8W Reduce clock load, power (1.3x) = Reduce clock load, power (1.3x) = 0.6W Reduce frequency 200 →160MHz, power (1.25x) = Reduce frequency 200 →160MHz, power (1.25x) = 0.5W J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC Microprocessor,” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp , Nov J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC Microprocessor,” IEEE J. Solid-State Circuits, vol. 31, no. 11, pp , Nov

Copyright Agrawal, 2007ELEC5270/6270 Spring 11, Lecture 1437 For More on Microprocessors T. D. Burd and R. W. Brodersen, Energy Efficient Microprocessor Design, Springer, T. D. Burd and R. W. Brodersen, Energy Efficient Microprocessor Design, Springer, R. Graybill and R. Melhem, Power Aware Computing, New York: Plenum Publishers, R. Graybill and R. Melhem, Power Aware Computing, New York: Plenum Publishers, 2002.