ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

1 ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++ Swathi Tanjore Gurumani, Aleksandar Milenkovic Electrical and Computer Engineering Department University of Alabama in Huntsville

2 ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Outline Objective Background Problem Overview Performance Evaluation - Overview Experimental Setup Results Conclusion and Future Research

3 ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Problem Objective Prove and stress the importance of designing architecture-aware compilers

4 ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Background - Application Performance  Advancement in processor technology Deep pipelining Multi-level cache hierarchy Improved branch predictors Out of order execution engine Advanced floating point Multimedia units  Compilers Optimization levels and switches  Compilers should keep up with processor technology

5 ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH  Compiler/hardware interaction can maximize application performance by Exploiting advances in processor technology Generating target-specific optimal codes  Path length reduction  Efficient instruction selection  Pipelining scheduling  Instruction level parallelism  Memory penalty minimization Architecture-aware Compilers

6 ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Performance Evaluation  Systematic process of data collection and analysis to determine and evaluate any system Benchmarks Exe Compile Performance Metrics  Benchmarks: A program that performs a strictly defined set of operations (a workload) and returns some form of result (a metric) describing how the tested computer performed.

7 ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Performance Evaluation – Previous Works  Study underlying architecture and characterize workloads Evaluation of Pentium Pro using SPEC 2000 Evaluation of Pentium II using Multimedia applications  Processor centric optimization Xeon vs. Pentium III Pentium III vs. Pentium IV  Compilers and optimization Branch optimizations by different compilers

8 ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Problem Overview  Objective Prove and stress the importance of architecture aware compilers  How? Compile benchmarks using different compilers Use same optimization switches Execute the binaries using performance analyzer Analyze and compare the performance metrics collected  Same OS, hardware features - difference in metrics only due to compiler used

9 ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Experimental Setup SPEC CPU2000 Exe IC++ Performance Metrics Exe VC++ Performance Metrics VTune Processor : Pentium IV Operating System : Windows 2000 Optimization Level : /O2 Input : Reference set from SPEC

10 ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH SPEC CPU2000  Portray real user application and computation intensive compiler  Can measure performance of processor, memory and compiler  Does not stress on I/O devices, networking and OS  Used CINT2000 and CFP2000 NameDescription 164.gzip (INT)Data Compression written in C 176.gcc (INT)C Programming Language Compiler 177.mesa (FP)3-D Graphics Library written in C 181.mcf (INT)Combinatorial Optimization written in C 186.crafty (INT)Chess – Game Playing written in C 197.parser (INT)Word Processing written in C 252.eon (INT)Computer Visualization written in C++ 253.perlbmk (INT)PERL Programming Language written in C (INT)Group Theory, Interpreter written in C 255.vortex (INT)Object Oriented database written in C

11 ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH VTune Performance Analyzer  Simultaneous sampling of multiple events and real time display using counter monitors event-based sampling  Supports time-based and event-based sampling To take advantage of Pentium IV’s EBS feature  Has a low intrusion Samples collected provide a closer representation of application’s actual performance  Events Collected Clockticks, instructions retired, loads retired, stores retired, branches retired, I level cache misses and mispredicted branches

12 ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Compiler Optimizations  Both compilers were used with /O2 option  Invoke the same switches and have same functions  Microsoft VC++ has special switches to target Pentium (/G5) & Pentium Pro (/G6)  Intel C++ compiler optimizes performance for applications running on Intel architecture-based computers OptionEffect /OdDisable optimization /O1Minimize size /O2Maximize speed  Performance gains by using IC++ are result of - profile-guided optimization - pre-fetch instruction - support for Streaming SIMD Extensions (SSE) - data prefetching - inter-procedural optimization

13 ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Comparison of Clock ticks  On average, 10% performance gain with IC++  Performance gain more pronounced for 3D graphics library and computer visualization application

14 ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Comparison of Binaries Benchmark Code Size (in Bytes) MSVC++IC++ 164.gzip69,63277,824 176.gcc1,089,5361,314,816 177.mesa442,368610,304 181.mcf49,15253,248 186.crafty241,664258,048 197.parser118,784131,072 252.eon405,504413,696 253.perlbmk516,096651,264 254.gap356,352413,696 255.vortex417,792454,656  VC++ produced smaller sized binaries

15 ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Comparison of Instruction Count  3D and Computer Visualization applications have a much reduced instruction count than others

16 ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Comparison of Loads

17 ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Comparison of Stores

18 ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Comparison of Branches

19 ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Comparison of Other Instructions

20 ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Comparison of Cache Misses

21 ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Conclusion & Future Research  Execution characteristics of CPU2000 benchmarks was presented for VC++ and IC++  IC++ performed better than VC++ for all considered applications and more pronounced for graphics applications  Distribution of loads, stores and branches were same – difference in absolute numbers  No difference in branch prediction and memory references  Use - Strength and weakness of compilers  Future Directions Different Optimization switches Usage of microbenchmarks for better control

