Download presentation
Presentation is loading. Please wait.
1
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Rudi Eigenmann Department of Electrical and Computer Engineering Purdue University eigenman@ecn.purdue.edu Allen Malony Department of Computer and Information Science University of Oregon malony@cs.uoregon.edu Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing b.mohr@fz-juelich.de
2
© 2000 Forschungszentrum Jülich, NIC-ZAM [2] Outline SPEC OMP2001 benchmark suite Motivation: Integrated performance tools in benchmarking suites Approach for OMPM2001 POMP OpenMP performance monitoring interface Automatic OpenMP instrumentation (OPARI) Performance analysis tools (EXPERT and TAU) Experiments Concluding remarks
3
© 2000 Forschungszentrum Jülich, NIC-ZAM [3] SPEC OMP2001 Benchmark Suite 11 application programs used in scientific computing CFD: APPLU, APSI, GALGEL, MGRID, SWIM Molecular dynamics: AMMP Crash simulation: FMA3D Neural network: ART Genetic algorithm: GAFORT Earthquake modeling: EQUAKE Quantum chromodynamics: WUPWISE Fortran and C source code with OpenMP parallelization Medium and large data sets Goals of portability and relative ease of use
4
© 2000 Forschungszentrum Jülich, NIC-ZAM [4] OMPM2001 Performance Measurement Studies OMP2001 measures and reports total execution time only Scalability results for different processor numbers “Performance Characteristics of the SPEC OMP2001 Benchmarks,” Aslot and Eigenmann, EWOMP 2001 Studies performance characteristics in detail –Timing profiles (scalability) across parallel sections –Memory system and cache (hardware counter) profiles Use of high-resolution timers and hardware counters Quantitative and qualitative explanations Custom instrumentation and measurement libraries Required hand-instrumentation of OpenMP constructs
5
© 2000 Forschungszentrum Jülich, NIC-ZAM [5] Performance Tools and Benchmark Suites Detailed performance measurement and analysis reveal interesting runtime characteristics in application codes Important for performance diagnosis and tuning Help to understand effects of a new parallel API (OpenMP) Benchmark suites typically do not have integrated tools Portability of performance tools is poor Hard to configure tools for benchmarking methodology Tools often require manual application and operation BUT: Automatic and portable performance tools could allow more in-depth, cross-platform performance analysis Goal: integrated performance tools for OMP2001
6
© 2000 Forschungszentrum Jülich, NIC-ZAM [6] Approach for OMPM2001 Leverage state-of-the-art performance instrumentation, measurement, and analysis technology POMP OpenMP performance monitoring interface OPARI automatic OpenMP source instrumentation Performance profile and trace measurement libraries EXPERT automatic event trace analyzer TAU performance analysis system Configure performance tools as integrated and automated components in OMPM2001 benchmarking methodology Conduct performance experiments on OMPM2001 codes Evaluate with respect to portability, ease of use, results
7
© 2000 Forschungszentrum Jülich, NIC-ZAM [7] POMP OpenMP Performance Monitoring Interface OpenMP instrumentation OpenMP directive/pragma instrumentation OpenMP runtime library routine instrumentation POMP Directive/Pragma Extensions Runtime library control ( !$POMP INIT, FINALIZE, ON, OFF ) (Manual) user code instrumentation !$POMP BEGIN(myname) … structured block !$POMP END(myname) Conditional compilation ( #ifdef _POMP ) Conditional / selective transformations ( !$POMP [NO]INSTRUMENT )
8
© 2000 Forschungszentrum Jülich, NIC-ZAM [8] Example: !$OMP PARALLEL DO Instrumentation !$OMP PARALLEL DO clauses... do loop !$OMP END PARALLEL DO !$OMP PARALLEL other-clauses... !$OMP DO schedule-clauses, ordered-clauses, lastprivate-clauses do loop !$OMP END DO !$OMP END PARALLEL DO NOWAIT !$OMP BARRIER call pomp_parallel_fork(d) call pomp_parallel_begin(d) call pomp_parallel_end(d) call pomp_parallel_join(d) call pomp_do_enter(d) call pomp_do_exit(d) call pomp_barrier_enter(d) call pomp_barrier_exit(d)
9
© 2000 Forschungszentrum Jülich, NIC-ZAM [9] OpenMP Runtime Library Routine Instrumentation Transform omp_###_lock() pomp_###_lock() omp_###_nest_lock() pomp_###_nest_lock() [ ### = init | destroy | set | unset | test ] POMP version Calls omp version internally Can do extra stuff before and after call
10
© 2000 Forschungszentrum Jülich, NIC-ZAM [10] Instrumentation of OpenMP Constructs OPARIOpenMP Pragma And Region Instrumentor Source-to-Source translator to insert POMP calls around OpenMP constructs and API functions Done: Supports Fortran77 and Fortran90, OpenMP 2.0 C and C++, OpenMP 1.0 POMP Extensions EPILOG and TAU POMP monitoring library implementations Preserves source code information ( #line line file ) http://www.fz-juelich.de/zam/kojak/opari/
11
© 2000 Forschungszentrum Jülich, NIC-ZAM [11] History and Future of POMP POMP OpenMP performance monitoring interface Forschungszentrum Jülich, University of Oregon Presented at EWOMP’01, LACSI’01, and SC’01 Published at "The Journal of Supercomputing", 23, 2002. European IST Project INTONE Development of OpenMP tools (incl. Monitoring interface) Pallas, CEPBA, Royal Inst. Of Technology, Tech. Univ. Dresden http://www.cepba.upc.es/intone KSL-POMP Development of OpenMP monitoring interface inside ASCI Based on POMP, but further developed in other directions Work in Progress: Investigating joint proposal Investigating standardization through OpenMP Forum
12
© 2000 Forschungszentrum Jülich, NIC-ZAM [12] EXPERT: Automatic Analysis of OpenMP + MPI Programs EX PER TEXtensible PERformance Tool Programmable, extensible, flexible performance property specification Based on event patterns Analyzes along three hierarchical dimensions Performance properties (general specific) Dynamic call tree position Location (machine node process thread) Foreach property severity matrix is computed Time losses due to performance property Per location and call tree node Call site Property Location
13
© 2000 Forschungszentrum Jülich, NIC-ZAM [13] Location How is the problem distributed across the machine? Class of Behavior Which kind of behavior caused the problem? Call Graph Where in the source code is the problem? In which context? Color Coding Shows the severity of the problem
14
© 2000 Forschungszentrum Jülich, NIC-ZAM [14] TAU Performance System Framework TAUTuning and Analysis Utilities Performance system framework for scalable parallel and distributed high-performance computing Targets a general complex system computation model nodes / contexts / threads Multi-level: system / software / parallelism Measurement and analysis abstraction Integrated toolkit for performance instrumentation, measurement, analysis, and visualization Portable performance profiling/tracing facility Open software approach http://www.cs.uoregon.edu/paracomp/tau http://www.cs.uoregon.edu/paracomp/pdtoolkit
15
© 2000 Forschungszentrum Jülich, NIC-ZAM [15] TAU Performance System Architecture EPILOG Paraver
16
© 2000 Forschungszentrum Jülich, NIC-ZAM [16] Instrumentation User functions EXPERT: –Compiler instrumentation (Linux PGI, Hitachi SR-8000) –Manual instrumentation via !$POMP directives TAU: –Source instrumentation based on PDT (Program DB Toolkit) based on commercial parsers from EDG and Mutec –Dynamic instrumentation via dyninst or DPCL –Manual instrumentation via TAU API OpenMP: Source instrumentation via OPARI MPI: wrapper library using "standard" PMPI monitoring interface
17
© 2000 Forschungszentrum Jülich, NIC-ZAM [17] Measurement and Analysis EXPERT EPILOG tracing library Automatic trace analysis through EXPERT Manual analysis through EPILOG VTF3 converter + Vampir TAU TAU tracing library Manual analysis through TAU VTF converter + Vampir TAU profiling library Manual analysis through RACY/jRacy TAU EPILOG tracing library Automatic trace analysis through EXPERT
18
© 2000 Forschungszentrum Jülich, NIC-ZAM [18] Integration with SPEC runspec tool Development of OPARI and of OPARI/TAU compile and link scripts Take “regular” compile / link command as argument Perform all necessary instrumentation, compilations, and linking Example usage in SPEC configuration file default:default:opari:default FC = opari-comp CC = opari-comp FLD = opari-link CLD = opari-link Invocation through runspec... --extension=opari...
19
© 2000 Forschungszentrum Jülich, NIC-ZAM [19] Experimental Setup: ZAMpano ZAMPANoZAM PArallel Nodes 9 node Linux cluster Each node 4 x Intel Pentium III Xeon 550 MHz, 512 Kbyte L1 cache 2 GByte ECC-RAM SuSE 7.2 Linux 2.4.4-4GB-SMP kernel PGI F77, F90, C, C++ compilers V3.3-2 Advantages Exclusive reservation for extended periods for measurements Simultaneous multiple measurements (on different nodes) Root access Full tool support
20
© 2000 Forschungszentrum Jülich, NIC-ZAM [20] Wishful Thinking meets Reality ;-) Problems with OMPM2001 building / compiling 1 GByte program+data size limit if dynamic linking is used PGI couldn’t compile AMMP GALGEL, EQUAKE, GAFORT, ART core dump midway WUPWISE runs but has result output differences SWIM, MGRID, APPLU, APSI, FMA3D run Problems with applying EXPERT Traces of SWIM, APPLU for “ref” data set Traces of SWIM, MGRID, APPLU, FMA3D for “test” set Problems with applying TAU PDT instrumentation failed due to NON-ANSI Fortran Instrumentor bug when OpenMP loops are 1 st executable line
21
© 2000 Forschungszentrum Jülich, NIC-ZAM [21] Results: Event Statistics ("test" data set) 2439203.9492.547SWIM55% Event rate [events/s] Trace size [# events] Time POMP [s] Time [s] Benchmark Over head 2,372108,40046.48143.780MGRID6%181,96053,9300.7190.282APPLU155%491,09214,9600.1950.046FMA3D324% Full tracing 3839202.5292.547SWIM0%2,375103,83644.47543.780MGRID1.6%74,07921,8980.4960.282APPLU76%372,1259,7280.1480.046FMA3D221% Restricted user event tracing
22
© 2000 Forschungszentrum Jülich, NIC-ZAM [22] Results: Event Statistics ("ref" data set) 8132,05417,06816,656SWIM2.5%20188,2989,5349,593APPLU0% Restricted user event tracing 8132,05416,67916,656SWIMO.1%~15,200~147.5 M10,6669,593APPLU11% Event rate [events/s] Trace size [# events] Time POMP [s] Time [s] Benchmark Over head Full tracing
23
© 2000 Forschungszentrum Jülich, NIC-ZAM [23] Results: EXPERT Analysis of SWIM
24
© 2000 Forschungszentrum Jülich, NIC-ZAM [24] Results: EXPERT Analysis of APPLU
25
© 2000 Forschungszentrum Jülich, NIC-ZAM [25] Results: Vampir SWIM "ref" data set
26
© 2000 Forschungszentrum Jülich, NIC-ZAM [26] Future Work Get more benchmarks running (other compilers?) Fix instrumentation, measurement, and analysis problems Fix TAU f90 instrumentor problems and get TAU profile data Get more / better EPILOG traces EXPERT profile library (to avoid huge traces) Extend analysis to other platforms SUN, SGI, Hitachi, IBM, NEC, … Investigate runtime trace compression techniques Other tools? Guide/VGV, Paraver, INTONE
27
© 2000 Forschungszentrum Jülich, NIC-ZAM [27] Conclusions More portable OMP SPEC benchmarks needed ANSI Fortran Dynamic data allocation “Small” data set Add !$OMP END [PARALLEL] DO 's Integrated POMP instrumentation ( !$POMP BEGIN/END(myname) ) for important user functions and executions phases Document and specify additional measurement events Would also solve instrumentation problems Integrated generic and portable SPEC POMP measurement library Can then easily be replaced by 3rd party / user POMP libraries OpenMP ARB POMP Standard would be big win (Until then: OPARI)
28
© 2000 Forschungszentrum Jülich, NIC-ZAM [28] Additional Issues Level of measurement detail What is necessary and appropriate? Could use base level and allow user-configured levels Full program execution vs. portion of program execution Distribution complexity Tool packages should be added to benchmark distribution Packages need to be easily obtained and configured Must be public domain or licensed through SPEC Publishing of detailed performance results Part of official SPEC benchmark report? …
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.