Download presentation
Presentation is loading. Please wait.
Published byFrank Butler Modified over 9 years ago
1
Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong
2
Advanced Computing Technology Center © 2005 IBM Corporation IBM High Performance Computing Toolkit (HPCT) One consolidated package Components: –Hardware Performance Monitor(HPM) –Simulation Guided Memory Analyzer (SiGMA) –MPI Profiler (MP_profiler) –OpenMP Profiler (PompProf) –Modular I/O Performance Tool (MIO) –Xprofiler –GUI integration tool w/ source code traceback (PeekPerf) –Watson Sparse Matrix Library (WSMP) included
3
Advanced Computing Technology Center © 2005 IBM Corporation Our Vision A toolkit that spans various aspects of high performance computing –CPU profiling, memory behavior analysis, communication profiling, I/O analysis and optimization Integrated performance monitoring and profiling environment – one single consistent interface for all components – enhanced functionality Binary instrumentation (without source code modification) Dynamic instrumentation Available on IBM Platforms –AIX, LoP, and BlueGene
4
Advanced Computing Technology Center © 2005 IBM Corporation Support Matrix HPMCount & HPMlib MP- profiler& MP-tracer Xprofiler SHMEM & SHMEM- profiler MIO PompPofi ler SiGMA PeekPerf Watson Sparse Matrix Package AIX Powe r today (AIX 5L 5.1, 5.3) today (AIX 4.3.3 +) today (AIX 5L 5.1) today (AIX 4.3.3+) today (AIX 5L 5.1) Linux Powe r Aug/05 (Linux 2.4 &2.6) May/05 (Linux 2.6) Aug-Sep/05 (Linux 2.6) N/A TBT (Linux 2.6) N/A Aug-Sep/05 (Linux 2.6) TBTTBT(Linux 2.6) Linux JS20 Aug/05 (Linux 2.4 &2.6) May/05 (Linux 2.6) Aug-Sep/05 (Linux 2.6) N/A TBT (Linux 2.6) N/A Aug-Sep/05 (Linux 2.6) TBTTBT(Linux 2.6) Linux BG/L Aug/05todayAug/05N/ATBTN/A todayN/A
5
Advanced Computing Technology Center © 2005 IBM Corporation Outline Xprofiler HPM MP Profiler OpenMP Profiler MIO
6
Advanced Computing Technology Center © 2005 IBM Corporation Xprofiler CPU profiling tool similar to gprof Can be used to profile both serial and parallel applications Use procedure-profiling information to construct a graphical display of the functions within an application Provide quick access to the profiled data and helps users identify functions that are the most CPU-intensive Based on sampling (support from both compiler and kernel) Charge execution time to source lines and show disassembly code
7
Advanced Computing Technology Center © 2005 IBM Corporation Xprofiler: Main Display Width of a bar: time including called routines Height of a bar: time excluding called routines Call arrows labeled with number of calls Overview window for easy navigation (View Overview)
8
Advanced Computing Technology Center © 2005 IBM Corporation Xprofiler: Source Code Window Source code window displays source code with time profile (in ticks=.01 sec) Access –Select function in main display – context menu –Select function in flat profile – Code Display – Show Source Code
9
Advanced Computing Technology Center © 2005 IBM Corporation Xprofiler - Disassembler Code
10
Advanced Computing Technology Center © 2005 IBM Corporation HPM provides comprehensive reports of hardware events that are critical to performance –Accurate and Low overhead –Comprehensive E.g., number of floating-point instructions executed, cache misses, TLB misses Derived metrics –correlate the behavior of the application to one or more of the hardware components Thread-level support Including –Hpmcount, libhpm, hpmstat
11
Advanced Computing Technology Center © 2005 IBM Corporation HPM Visualization Using PeekPerf
12
Advanced Computing Technology Center © 2005 IBM Corporation MP_profiler A set of libraries that collect profiling data for MPI and TurboSHMEM applications –Implements wrappers using PMPI interface Report performance metrics, e.g., –time used by MPI function calls –message sizes Visualization tools help users identify performance bottlenecks –peekperf maps performance metrics back to the source codes –peekview gives a visual representation of the overall computation and communication pattern of the system.
13
Advanced Computing Technology Center © 2005 IBM Corporation MP_Profiler Visualization Using PeekPerf
14
Advanced Computing Technology Center © 2005 IBM Corporation MP_Tracer Visualization Using PeekPerf
15
Advanced Computing Technology Center © 2005 IBM Corporation POMP Profiler (PompProf) Generates a detailed profile describing overheads and time spent by each thread in three key regions of the parallel application: –Parallel regions –OpenMP loops inside a parallel region –User defined functions Profile data is presented in the form of an XML file that can be visualized with PeekPerf
16
Advanced Computing Technology Center © 2005 IBM Corporation DPOMP Dynamically instruments OpenMP applications Has the advantage of the being able to modify binaries with performance instrumentation without requiring access to souce codes or recompilation Based on dynamic probes using DPCL
17
Advanced Computing Technology Center © 2005 IBM Corporation PompProf Visualization Using PeekPerf
18
Advanced Computing Technology Center © 2005 IBM Corporation Modular I/O Performance Tool (MIO) I/O Analysis –Trace module –Summary of File I/O Activity + Binary Events File –Low CPU overhead I/O Performance Enhancement Library –Prefetch module (optimizes asynchronous prefetch and write-behind) –System Buffer Bypass capability –User controlled pages (size and number) Recoverable Error Handling –Recover module (monitors return values and errnor + reissues failed requests) Remote Data Server –Remote module (simple socket protocol for moving data) Shared object library for AIX
19
Advanced Computing Technology Center © 2005 IBM Corporation MIO User Code Interface #define open64(a,b,c)MIO_open64(a,b,c,0) #define readMIO_read #define writeMIO_write #define closeMIO_close #define lseek64MIO_lseek64 #define fcntlMIO_fcntl #define ftruncate64MIO_ftruncate64 #define fstat64MIO_fstat64
20
Advanced Computing Technology Center © 2005 IBM Corporation MIO Trace Module (sample partial text output) Trace close : program pf : /bmwfs/cdh108.T20536_13.SCR300 : (281946/2162.61)=130.37 mbytes/s current size=0 max_size=16277 mode =0777 sector size=4096 oflags =0x302=RDWR CREAT TRUNC open 1 0.01 write 478193 462.10 59774 59774 131072 131072 read 1777376 1700.48 222172 222172 131072 131072 seek 911572 2.83 fcntl 3 0.00 trunc 16 0.40 close 1 0.03 size 127787
21
Advanced Computing Technology Center © 2005 IBM Corporation MSC.Nastran V2001 Benchmark: SOL 111, 1.7M DOF, 1578 modes, 146 frequencies, residual flexibility and acoustics. 120 GB of disk space. Machine: 4-way, 1.3 GHz p655, 32 GB with 16 GB large pages, JFS striped on 16 SCSI disks. MSC.Nastran: V2001.0.9 with large pages, dmp=2 parallel=2 mem=700mb The run with MIO used mio=1000mb Time (seconds) 6.8 TB of I/O in 26666 seconds is an average of about 250 MB/sec 0 10,000 20,000 30,000 40,000 50,000 60,000 no MIOwith MIO Elapsed CPU time
22
Advanced Computing Technology Center © 2005 IBM Corporation
23
Advanced Computing Technology Center © 2005 IBM Corporation
24
Advanced Computing Technology Center © 2005 IBM Corporation Problems that we are considering Performance profiling and monitoring for scientific applications on large systems –Selectively generates and reports profiling data –Large amount performance data management and analysis Composite profiling and presentation –CPU profiling –Hardware Performance Counter profiling –Communication profiling
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.