Download presentation
Presentation is loading. Please wait.
Published bySarah Grant Modified over 9 years ago
1
Tortola: Addressing Tomorrow’s Computing Challenges through Hardware/Software Symbiosis Kim Hazelwood September 29, 2006
2
2 of 26 Modern Computing Challenges Performance Power –Energy consumption, max instantaneous power, di/dt Temperature –Total heat output, “hot spots” Reliability –Neutron strikes, alpha particles, MTBF, design flaws Approaches: Circuit, microarchitecture, compiler Constraint: Fixed HW-SW interface (e.g., x86)
3
3 of 26 Typical Approaches Optimize using SW or HW techniques in isolation Performance –SW: Compile-time optimizations –HW: Architectural improvements, VLSI technology Reliability: Code/data duplication (HW or SW) Power & Temperature –HW control mechanisms –Profile + recompile cycle HW SW
4
4 of 26 Modern Design Constraints Compilers – “Compile once, run anywhere” –Cannot ship “MS Office for 1Q05 batch of Pentium-4 3GHz, > 1GB RAM, BrandX power supply, located in high altitudes…” Microarchitecture – Limited window of application knowledge (past must predict the future) VLSI – Guaranteed correctness, reliability We currently must optimize for the common case (but must design for the worst case)
5
5 of 26 The Power of Virtualization A HW-SW interface layer HW SW Applications Binary Modifier Initially x86 Eventually SWI HWI
6
6 of 26 Dynamic Binary Modification Creates a modified code image at run time EXE Transform Code Cache Execute Profile Examples: Dynamo (HP) DAISY/BOA (IBM) CMS (Transmeta) Mojo (Microsoft) Strata (UVa) Pin (Intel)
7
7 of 26 Dynamic Instrumentation Demo Pin –Four architectures – IA32, EM64T, IPF, XScale –Four OSes – Linux, FreeBSD, MacOS, Windows –http://rogue.colorado.edu/pin/
8
8 of 26 Dynamic Optimization Demo DynamoRIO –Windows and Linux for IA32 –http://www.cag.lcs.mit.edu/dynamorio/
9
9 of 26 Dynamic Binary Modification Creates a modified code image at run time Always triggered by software events … until now EXE Transform Code Cache Execute Profile Examples: Dynamo (HP) DAISY/BOA (IBM) CMS (Transmeta) Mojo (Microsoft) Strata (UVa) Pin (Intel)
10
10 of 26 Tortola: Symbiotic Optimization Enable HW/SW Communication HW SW Applications Binary Modifier
11
11 of 26 Simulation Methodology SimpleScalar 4.0 for x86 Wattch 1.02 power extensions Pin dynamic instrumentation system (x86/Linux version) HW SW Application Binary Modifier Benchmarks Pin Wattch & Simplescalar/x86
12
12 of 26 Tortola Applications Combine global program information with run- time feedback –System-specific power usage –Application-specific heat anomalies –Workload/input specific performance optimization Reduce hardware complexity –No more backwards compatibility warts –Fix bugs after shipment –Reduce time to market for new architectures One such application: The di/dt problem
13
13 of 26 The Di/dt Problem Voltage stability is important for reliability, performance Low-power techniques have a negative side effect: current variation Dips (undershoots) in supply voltage – can cause incorrect values to be calculated or stored Spikes (overshoots) in supply voltage – can cause reliability problems
14
14 of 26 The Di/dt Problem ITRS cites noise management as a Grand Challenge for 5-10 year time frame Several trends are aggravating the issue: –Voltage is scaling down with technology –Current draw is increasing –Package impedance is not scaling as quickly –Aggressive clock gating causes large swings in processor current draw (di/dt)
15
15 of 26 Di/dt Solutions Software MicroArch Circuit-Level Compiler Optimizations Sensor/Actuator Mechanisms Decoupling capacitors More Vdd Gnd pins on package Co-Designed MicroArch & SW Binary Modifier
16
16 of 26 Sensor-Actuator Mechanisms On-chip voltage sensors detect abnormally high/low voltage levels On-chip actuator then attempts to quickly raise/lower the processor’s current draw –Phantom firing increases current (at the expense of power) –Resource throttling reduces current (at the expense of performance)
17
17 of 26 Detecting Imminent Emergencies 1.05V 1.03V 1V 0.97V 0.95V Operating Voltage Range Hard Emergency Soft Emergency Control Threshold
18
18 of 26 Targeting Mid-Frequency Di/dt Problematic: wide current spike Worst case: pulse at the resonant frequency Supply Voltage (V) Processor Current (A) Time (Cycles) Minimum Voltage 20 cycles Supply Voltage (V) Processor Current (A) Time (Cycles) Minimum Voltage 60 cycles Maximum Voltage *From: Joseph et al. HPCA-9
19
19 of 26 A Di/dt Stressmark BEGIN_LOOP: … ldt$f1, ($4) divt$f1, $f2, $f3 divt$f3, $f2, $f3 stt$f3, 8($4) ldq$7, 8($4) cmovne$31, $7, $3 stq$3, $(4) … stq$3, $(4) … JUMP BEGIN_LOOP Sequential Low Current Parallel High Current But…Actuator engages every loop iteration degrading performance Why not correct the problem in the code?
20
20 of 26 Proposed Solution Leverage our additional software layer to supplement existing solutions Microarchitecture provides feedback to our software- based virtual layer Microprocessor Sensor+Actuator Ext AlteredExecutable SW HW Binary Modifier Executable VL
21
21 of 26 Required Investigations Characterizing emergencies –How often do we see di/dt emergency loops? Communication between the microarchitecture and the virtual layer –What information should be passed to virtual layer during an emergency? Fixing di/dt via binary modification –Will existing techniques help? –New algorithms?
22
22 of 26 Static vs. Dynamic Instances Data suggests modifying a few code sequences will eliminate many voltage emergencies
23
23 of 26 Possible Compiler Optimizations Our goal is to –Smooth out current profile, or –Knock pulses off of the resonant frequency Some existing options –Software pipelining, code motion, instruction padding Microprocessor Sensor+Actuator Ext’ns AlteredExecutable Binary Modifier ApplyOptimizations Executable
24
24 of 26 Loop Unrolling & SW Pipelining AABBAABB Iteration=1 Iteration=2 Iteration=3 Current AABBAABB AABBAABB AABBAABB Software pipelining smoothes profile Current Problematic loop: AAAABBBBAAAABBBB Current Loop unrolling disrupts resonance pulse Unrolled loop:
25
25 of 26 Unrolling the Di/dt Stressmark H L H1 H2 L1 L2
26
26 of 26 Summary Symbiotic program optimization is a powerful approach The di/dt problem – well suited for a symbiotic solution The Tortola design can also target power reduction, temperature reduction, reliability, etc. http://www.tortolaproject.com/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.