Download presentation
Presentation is loading. Please wait.
Published byNathan Dalton Modified over 8 years ago
1
1 Design and Implementation of the POWER5 Microprocessor J. Clabes 1, J. Friedrich 1, M. Sweet 1, J DiLullo 1, S. Chu 1, D. Plass 2, J. Dawson 2, P. Muench 2, L. Powell 1, M. Floyd 1, B. Sinharoy 2, M. Lee 1, M. Goulet 1, J. Wagoner 1, N. Schwarz 1, S. Runyon 1, G. Gorman 1, P. Restle 3, R. Kalla 1, J. McGill 1, S. Dodson 1 1 IBM System Group, Austin, TX 2 IBM System Group, Poughkeepsie, NY 3 IBM Research, Yorktown Heights, NY
2
2 Outline Project Objective Microarchitecture Changes Implementation Overview Design Enablers Integration Challenges Timing and Hardware Performance Power Efficiency Summary
3
3 POWER5™ Chip Objectives Build on POWER4™ base Maintain binary and structural compatibility Deliver superior performance Enhance and extend SMP scalability Provide additional server flexibility Enhance reliability, availability, serviceability (RAS) attributes Deliver power efficient design Project…
4
4 Simultaneous Multithreading in POWER5 Chip Each chip appears as a 4-way SMP to software Processor resources optimized for enhanced SMT performance Software controlled thread priority Dynamic feedback of runtime behavior to adjust priority Dynamic switching between single and multithreaded mode FX0 FX1 FP0 FP1 LS0 LS1 BRX CRL Single Threaded Operation Thread 0 active Microarchitecture…
5
5 Simultaneous Multithreading in POWER5 Chip Each chip appears as a 4-way SMP to software Processor resources optimized for enhanced SMT performance Software controlled thread priority Dynamic feedback of runtime behavior to adjust priority Dynamic switching between single and multithreaded mode FX0 FX1 FP0 FP1 LS0 LS1 BRX CRL Simultaneous Multi-Threading Thread 0 activeThread 1 active Microarchitecture…
6
6 PP L2 Memory PP Mem Ctl Fab Ctl Reduced L3 Latency Faster access to memory L3 Cntrl L2 L3 Cntrl Larger SMPs Number of chips cut in half Modifications to POWER4 System Structure
7
7 POWER5 Chip Overview Technology: 130nm lithography, SOI, Cu wiring 276M transistors 389 mm 2 die size Two 8-way superscalar SMT cores Memory subsystem with 1.9MB L2-Cache, L3 directory and memory controller on chip Extensive RAS support High-speed elastic bus interface Implementation…
8
8 ERAT and D-Cache Array Design Changes System performance vs. area trade-off ERAT: Fully associative, implemented as Sum-Address CAM D-cache: 4-way associativity Result: 2-3% performance gain with improved wireability at 5% area cost Design…
9
9 L2 and I-Cache Array Design Changes SMT drives thread level parallelism Improved associativity on L2-Cache (10-way) and I-Cache (2-way) L2 access shifted by ½ cycle avoiding extensive array redesign High speed latch with compare on I-Cache access path Design…
10
10 2 nd Generation Elastic Interface Design EI-II performance improvements Runs over 2 GHz in laboratory -- head-room on IO frequencies –Allows bus frequencies to continue scaling with processor frequency Optimizes V ref at T0 by level forwarding Maintains guardband via periodic self calibration Design…
11
11 Implementation of Engineered Buses and IO Wires Pre-planned and custom routed buses ~50K engineered wires at chip level ~2X of POWER4 chip Custom buffer insertion process ~250K buffer/inverters 2.5X of POWER4 chip Wire and bus characterization Noise tolerance Impact of coupling on delay Inductance analysis Integration…
12
12 Implementation of Engineered Buses and IO Wires Pre-planned and custom routed buses ~50K engineered wires at chip level ~2X of POWER4 chip Custom buffer insertion process ~250K buffer/inverters 2.5X of POWER4 chip Wire and bus characterization Noise tolerance Impact of coupling on delay Inductance analysis IO performance driven routing 5Ω resistance limit on chip Fully shielded (single ended design) Integration…
13
13 Dual Clock Distribution total nominal skew18ps local skew9ps slew rate from 30 - 70%52 - 71ps latency PLL to LCB777ps duty cycle control±25ps switching power @ 1.08V and 2GHz 10.5W total nominal skew18ps local skew9ps slew rate from 30 - 70%52 - 71ps latency PLL to LCB777ps duty cycle control±25ps switching power @ 1.08V and 1.8GHz 9.5W Integration… Memory Clock Domain (4 Buffers) 1 central chip buffer 3 sector buffers asynchronous to main mesh Main Clock Grid (91 Buffers) 1 full chip buffer 1 central chip buffer 3 half chip buffers 6 quadrant buffers 80 sector buffers
14
14 Chip Timing and Shmoo Plot Timing Closure Sort mode (functional/scan/lbist) Early mode (functional/scan) Timing Model Analysis 690K scannable M/S latches 180K non-scan mid-cycle latches 6.75M timing checks TAT 19 hours Shmoo Plot Frequency (GHz) Voltage (Volt) at 25ºC Fail Pass Timing…
15
15 Power Efficient Design Implementation DC power mitigation Leverage triple V t technology Decrease low V t usage by 90% Increase high V t usage by 30% Leverage triple T ox technology Thick T ox usage for decoupling capacitors AC power mitigation Minimal usage of dynamic circuits Reduce loading on clock mesh Incorporation of dynamic clock gating Power…
16
16 scan-only latches C2 latches gating logic global disable local disable mesh clock gated c1 clock dynamic stop enable cycle-to-cycle clock control (~1/2 cycle path) cycle-predict clock control (~full cycle path) scan-only latches C2 latches gating logic global disable local disable mesh clock gated c1 clock dynamic stop enable MS latch Dynamic Clock Gating Implementation Power… Approach allows aggressive use of clock gating to conserve power
17
17 Improved Power Efficiency AC power reduction by ≥ 25% DC power reduction by ≥ 50% Total power reduction by > 33% for numerical intensive workload Power…
18
18 Power… Thermal Protection recovery-temperature over-temperature
19
19 Summary First dual core SMT microprocessor Extended SMP to 64-way Operating in laboratory Power dynamically managed with no performance penalty Implementation permits future technology scalability from circuit and power perspective Innovative approach leveraging technology with system focus for high performance in a power efficient design Summary…
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.