© 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 Systematic Energy Characterization of CMP/SMT Processor Systems via Automated Micro-Benchmarks R. Bertran* +, A. Buyuktosunoglu*, M. Gupta*, M. Gonzalez +, P. Bose* *IBM T.J. Watson Research Center + Barcelona Supercomputing Center
© 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, What is the maximum power consumption? Any performance bug? Any reliability issues? … Time consuming and tedious – Error prone task Trial and error process – Several micro- benchmarks are required Deep expertise limited to few designers – Detailed knowledge of the underlying architecture is required Why do we need micro-benchmarks? Micro-benchmarks! AUTOMATED SOLUTION NEEDED!
© 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 MicroProbe: a micro-benchmark generation framework
© 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 MicroProbe Workflow MicroProbe Framework User Micro- Bench- mark InputsOutputs Micro- benchmark generation policy Architecture Definition files Endless loop 50% INT 50% FP Endless loop for each instruction of the ISA Micro- Bench- mark Micro- Bench- mark Micro- Bench- mark Max Power stressmark External tools Real platforms SimulatorsModels
© 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 MicroProbe: Distinguishing Features 5 FeaturePrevious worksMicroProbe ISA queries - Instruction type - Operand length, binary codification etc. (manual) Micro-architecture queries - Functional unit, latency, throughput, energy per instruction, average instruction power etc. (manual) Micro-architecture models - Set-associative cache model (no) Code generation - Skeleton and instruction definition passes, memory modeling pass, branch modeling pass, ILP definition pass. - Configurable passes (no) Design space exploration - Integrated (no) - GA-based search - Exhaustive search (manual) - Customizable search (manual)
© 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 MicroProbe Usage and Design Overview Research idea Micro-benchmark generation policies (user-defined scripts) Loop stressing the floating point unit Sequence of loads hitting 50% L1 and 50% L2 Generate a stress- mark for each functional unit of the architecture Search for the sequence of 2 loads and 2 integer operations with maximum IPC MicroProbe Framework (Python API) Architecture module Code generation module Design space exploration module ISA definitions ISA definitions ISA definitions Micro-architecture analytical models Micro-architecture analytical models Micro-architecture analytical models Micro-architecture definitions Micro-architecture definitions Micro-architecture definitions Micro-benchmark synthesizer Passes Search drivers Search drivers Search drivers Properties Micro-benchmark Automatic bootstrap process External tools
© 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 Max-power Stressmark Generation 7 Use MicroProbe to generate max- power stressmark Characterize energy per instruction (EPI) and IPC (Architecture Module) Select N instructions with max (IPC* EPI) Form a basic endless loop (e.g. 4K) using selected instructions (Code Generation Module) Generate micro-benchmarks with different orders of the selected N instructions Evaluate using Design Space Exploration Module Pick the highest power microbenchmark Loop: … mulldo lxvw4x xvnmsubmdp … mulldo xvnmsubmdp lxvw4x Loop: … mulldo lxvw4x mulldo xvnmsubmdp lxvw4x xvnmsubmdp …
© 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 CASE STUDIES MicroProbe: A Micro-benchmark Generation Framework 8
© 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 Experimental Methodology Platform: – Processor: 3GHz 8-core 4-way SMT 32KB L1, 256KB L2 and 4MB L3 per core – Memory: 32 GB DDR3 800MHz – OS: RHEL Linux – EnergyScale architecture Power measurements in miliwatts Sampling rate up to 1ms In-house software collects power and performance counter traces [C. Lefurgy et al, IBM] 9
© 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 Case Study 1: EPI Characterization 10 High differences in EPI across instructions stressing different micro- architecture components High differences in EPI across instructions stressing the same micro- architecture components and at the same rate (IPC)
© 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 MicroProbe Heuristic: Max(EPI * IPC) Selected instructions: mulldo, xvnmsubmdp, lxvw4x Case Study 2: Max-power Stressmark Generation 11 ? Use a computational intensive kernel Use complex instructions accessing different functional units with high IPC Generate all possible combinations of complex instructions stressing different units Use MicroProbe DAXPY Selected intructions: mullw xvmaddadp lxvd2x Loop: … mullw xvmaddadp lxvd2x … Loop: … mullw lxvd2x mullw xvmaddadp lxvd2x … Loop: … mullw lxvd2x mullw xvmaddadp lxvd2x xvmaddadp … MicroProbe Loops Expert DSE Expert manual MicroProbe
© 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 Max-power Stressmark Generation 12
© 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 Case Study 3: Counter-based Processor Power Model 13 Bottom- up Power modeling method Dynamic Power f(PMCs) Intercept SMT1 Intercept SMT2-4 SMT effect Linear Regression f(CMP) CMP effect Uncore power Func.Unit micro- Benchmarks CMP1–SMT1 Random micro- Benchmarks CMP1–SMT1 Random micro- Benchmarks CMP1–SMT2/4 Random micro- Benchmarks CMP1/8–SMT2/4 Model: Dynamic Power f(PMCs) SMT effect CMP effect Uncore power SMT enabled # cores 1 2 3
© 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 Counter-based Processor Power Model Validation Within acceptable error margins: < 4% on average
© 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 Counter-based Processor Power Model Validation on Corner Cases Models trained using non-micro-architecture aware training sets show high errors and variability Models trained using the micro-architecture aware training set show acceptable error margins: < 5% on average
© 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 Conclusions MicroProbe is a productive micro-benchmark generation framework – Adaptive and flexible – Includes micro-architecture semantics – Integrates design space exploration Presented three case studies: – Instruction-based EPI characterization – Automated max-power stressmark generation – CMP/SMT-aware bottom-up counter-based processor power model 16
© 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012 QUESTIONS? MicroProbe: A Micro-benchmark Generation Framework 17