Download presentation
Presentation is loading. Please wait.
Published byGervais Hood Modified over 9 years ago
1
1 Power/Temperature analysis of register file architecture for superscalar processor Hardware/Software co-design term-end project R92922128 水沼 仁志 2004/06/08
2
2 1. Introduction 2. Motivation & background 3. Register File architecture study 4. Simulation methodology 5. HotSpot introduction 6. Experimental result 7. Conclusion ContentsContents
3
3 1. Introduction “Temperature” becomes a serious headache for modern microprocessor designers. Quoted from “Temperature-aware microarchitecture presentation slide”
4
4 19902000 10 Heat density (W/cm 2 ) 100 1000 2010 Hotplate (2003) 1 Nuclear plant Rocket nozzle Solar surface Quoted from Intel developer’s forum If we don’t take action right now, the heat density would reach unbearable level within a decade. 1. Introduction
5
5 2. Motivation & background Register file has become a hot spot on the modern processor. Hot Spots Quoted from “Termal Modeling and Measurement of Large High Power Silicon Devices with Asymmetric Power Distribution”, Jeffrey Deeney
6
6 Register file has become a hot spot on the modern processor. Hot Spots Quoted from “Temperature-aware micro-architecture”, Kevin skadron, etc. 2003 2. Motivation & background
7
7 4-reg MUX IFID MUX IFID Way0 Way1 Register File also becomes a critical path determining cycle time, as issue width increases. 2. Motivation & background
8
8 Two major schemes were proposed to reduce RF delay. 1. RF duplicating 2. RF banking 2. Motivation & background
9
9 “RF duplicating” is to reduce port density by duplicating RF (from 8 ports to 4 ports). 8-reg RF MUX 8-reg RF Cluster0 Cluster1 2. Motivation & background 8-reg MUX IFID MUX IFID IFID IFID Way0 Way1 Way2 Way3 IFID IFID IFID IFID Way0 Way1 Way2 Way3 (Original)(RF duplicating)
10
10 “RF banking” is a technique to reduce a port density by splitting RF into multiple-bank structure (From 8 ports to 2 ports). 2-reg RF MUX IFIDWay0 MUX 2-reg RF MUX 2-reg RF MUX IFIDWay1 IFIDWay2 IFIDWay3 (Original)(RF banking) 2. Motivation & background
11
11 These two schemes are also beneficial to power saving because the total power necessary to drive each line/port is reduced. SchemePROs RF duplicating - Clock up thanks to port density decrease - Power reduction thanks to port density decrease RF banking 2. Motivation & background
12
12 Way0 Way1 Way2 Way3 RF duplicating scheme’s drawback 1: Additional power is required for synchronization between two RF contents Way0 Cluster0 Cluster1 Way1 Way2 Way3 (Original) (RF duplicating) RF read RF write Time 2. Motivation & background
13
13 RF duplicating scheme’s drawback 2: Inter-cluster bypass path becomes a performance bottleneck. 8-reg RF MUX 8-reg RF Cluster0 Cluster1 Renaming Window Instruction Decode Instruction Fetch Data cache 2. Motivation & background
14
14 RF banking scheme’s drawback: Performance loss due to bank conflicts when too many global ports try to access the same local port/bank. Arbiter Decoder 2-reg RF MUX Renaming Window MUX 2-reg RF MUX 2-reg RF MUX Arbiter Decoder Arbiter Instruction Decode Instruction Fetch Data cache Decoder Global portLocal port Decoder 2. Motivation & background
15
15 Pros & Cons between two schemes are as follows. SchemePROsCONs RF duplicating - Clock up thanks to port density decrease - Power reduction thanks to port density decrease - Power increase due to RF synchronization overhead - CPI down due to longer bypass overhead in case of inter-cluster communication RF banking- CPI down due to Instruction stall in case of port/bank conflict 3. Research goal
16
16 From the power / temperature view point, we need to analyze and quantify power overhead caused by these schemes. 3. Research goal SchemePROsCONs RF duplicating - Clock up thanks to port density decrease - Power reduction thanks to port density decrease - Power increase due to RF synchronization overhead - CPI down due to longer bypass overhead in case of inter-cluster communication RF banking- Clock up thanks to port density decrease - Power reduction thanks to port density decrease - CPI down due to Instruction stall in case of port/bank conflict
17
17 My research goal is to use the metrics of power and temperature to evaluate two clock- up schemes for register file, “RF duplicating” and “RF banking”. 3. Research goalPerformance PowerTemperature Architectural simulation Temperature simulation Power simulation
18
18 My experimental procedures are; 1.Modify architectural simulator (SimpleScalar) /power simulator (Wattch) to imitate “RF duplicating” and “RF banking” schemes. 2.Study temperature simulator (HotSpot) and combine it with architectural/power simulators. 3. Evaluate the power/temperature impact of both clock-up schemes. 3. Research goal
19
19 4. Simulation Methodology Wattch Simple Scalar FU (Functional Unit) access pattern CPI Active power per FU HotSpot Net performance calculation Functional unit temperature Alpha 21364 configuration SPEC 2000 benchmark
20
20 4. Simulation methodology Frequency600MHz Pipeline Width4 # of RUUs16 Load/Store Queue Depth8 # of Integer ALU (Mult/Div)4 (1) # of FP ALU (Mult/Div)1 (1) TLB I/D64 entries, 4way, 30cycles / 64 entries, 4way, 30cycles Instruction length32 bit L2 (64B Blocks)32KB, 4way, 6 cycles (Unified) L1 I$/D$ (32KB Blocks)16KB,direct,LRU / 16KB,4way,LRU Branch PredictionBimod (BTB size: 2048) Mis-prediction Latency3 cycles Memory LatencyFirst 18 cycles, Next 2 cycles Program execution parameterGCC, Fastfwd: 100M cycles, Duration: 100M cycles Simulated processor configuration
21
21 32 registers RF configuration was changed as follows; 4. Simulation methodology RF duplicating 32 registers RF banking 16 registers
22
22 I modified ruu_dispatch ( ) in the SimpleScalar to emulate the read event for “RF duplicating” scheme Dispatch width > 1 ? YesNo Dispatch width = 0 Regfile0_access ++Regfile1_access ++ Fetch inst from buffer This instruction needs RF read? No Yes No This instruction needs RF read? Dispatch inst to Cluster 0Dispatch inst to Cluster 1 Dispatch width ++ Are there more inst in buffer? Yes Exit No 4. Simulation methodology (RF duplicating)
23
23 This instruction needs RF write? No Regfile0_access ++Regfile1_access ++ Instruction committed This instruction is cluster 0 or 1? Cluster 0 Yes Exit Cluster 1 Regfile1_access ++Regfile0_access ++ I modified ruu_commit ( ) in the SimpleScalar to emulate the write event for “RF duplicating” scheme. 4. Simulation methodology (RF duplicating)
24
24 I modified ruu_dispatch ( ) in the SimpleScalar to emulate the read event for “RF banking” scheme Source reg # >15? YesNo Dispatch width = 0 Bank0_access ++Bank1_access ++ Fetch inst from buffer This instruction needs RF read? NoYes No Access Bank0Access Bank1 Dispatch width ++ Are there more inst in buffer? Yes Exit No This instruction needs RF read? More source reg? No 4. Simulation methodology (RF banking)
25
25 This instruction needs RF write? No Bank0_access ++Bank1_access ++ Instruction committed Bank 0 Yes Exit Bank 1 I modified ruu_commit ( ) in the SimpleScalar to emulate the write event for “RF banking” scheme. Destination reg # >15? NoYes 4. Simulation methodology (RF banking)
26
26 5. HotSpot introduction Simplistic Dynamic Compact Thermal Model (a.k.a. RC model) was used. This model uses electrical-thermal duality as below; V temp (T) I power (P) R thermal resistance (R th ) C thermal capacitance (C th ) RC = time constant R th = t / (k ・ A) C th = c ・ t ・ A A t k = thermal conductivity of this material (W/mk) c = thermal capacitance per unit volume (J/m 3 ) k c
27
27 P : Total wattage generated inside of block C th ・ dT/dt : Wattage consumed to heat up block T/ R th : Wattage passing through block From the power balance among the three, we know that P = C th ・ dT/dt + T/ R th. Hence, dT/dt = (RP - T) / RC. T ( ℃ )T + dT/dt ( ℃ ) 1 unit time after T / R th P 5. HotSpot introduction
28
28 The differential equation above is solved using a fourth- order Runge-Kutta method. 1. Try to solve dy/dx=f(x,y) with initial state of y(x0)=y0 2. Partition the interval into n with dx 3. When the x value is x0, x1=x0+dx, x2=x0+2dx, ・・・, xn=x0+ndx, approximated incremental value is calculated using k1,k2,k3,and k4 as follows; k=1/6(k1+2k2+2k3+k4) Here, k1,k2,k3,and k4 are represented as follows; k1=f(x0,y0)dx k2=f(x0+dx/2,y0+k1/2)dx k3=f(x0+dx/2,y0+k2/2)dx k4=f(x0+dx,y0+k3)dx 4. y1 will be calculated as y1=y0+(k1+k2+k3+k4)/6 5. HotSpot introduction
29
29 P = 10.0 W R th = 1.25 (K/W) C th = 0.005 (J/K) The values of P, Rth, Cth are computed by “Reducing Power Density through Activity Migration” I simulated register file temperature over time…. Register file Silicon die 5. HotSpot introduction
30
30 5. HotSpot introduction Alpha 21364 floor-plan was used after slight modification. I assume the Integer Register file functional unit area remains same after implementing “RF duplicating” and “RF banking” scheme.
31
31 5. HotSpot introduction Heat sink and heat spreader dimension remains same as that of HotSpot default setting
32
32 Die thickness50 um Die area16 mm x 16 mm Convection capacitance140.4 J/K Convection registance0.2 K/W Heat-sink side60 mm Heat-sink thickness6.9 mm Spreader side30 mm Spreader thickness1.0 mm Interface material thickness0.075 mm Ambient temperature40 C Sampling interval10K cycles (= 1.667 msec) Activity factor (for Wattch)Static Activity factors (Power value does NOT depend on FU access status) Clock gating method (for Wattch) Ideal, aggressive (zero power consumed when power off) Simulated thermal environment factor 5. HotSpot introduction
33
33 6. Experimental result Temperature simulation result
34
34 6. Experimental result Power/Temperature simulation results OriginalRF duplicatingRF banking RF power average 0.286 W0.218 W (- 23.8%) 0.104 W (- 63.6%) Peak temperature 59.4 C57.6 C (- 3.1%) 54.9 C (- 7.6%)
35
35 7. Conclusion 1.Clock-up schemes of “RF duplicating” and “RF banking” also have a positive effect in power-saving. 2.“RF banking” saves RF power by 63.6% while “RF duplicating” by 23.8%. 3. A peak temperature almost remains same, despite of a huge power-saving above. 4. Other temperature-reduction scheme must be invented to tackle hot-spot problem.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.