Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong.

Slides:



Advertisements
Similar presentations
K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.
Advertisements

Profiler In software engineering, profiling ("program profiling", "software profiling") is a form of dynamic program analysis that measures, for example,
Intel® performance analyze tools Nikita Panov Idrisov Renat.
Automated Instrumentation and Monitoring System (AIMS)
MCTS GUIDE TO MICROSOFT WINDOWS 7 Chapter 10 Performance Tuning.
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
04/14/2008CSCI 315 Operating Systems Design1 I/O Systems Notice: The slides for this lecture have been largely based on those accompanying the textbook.
UNIX Chapter 01 Overview of Operating Systems Mr. Mohammad A. Smirat.
Accurate and Efficient Replaying of File System Traces Nikolai Joukov, TimothyWong, and Erez Zadok Stony Brook University (FAST 2005) USENIX Conference.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
MCITP Guide to Microsoft Windows Server 2008 Server Administration (Exam #70-646) Chapter 14 Server and Network Monitoring.
Rensselaer Polytechnic Institute CSC 432 – Operating Systems David Goldschmidt, Ph.D.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 2: Operating-System Structures Modified from the text book.
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
BAB 2 BASIC OPERATING SYSTEM CONCEPT MANAGEMENT. User interface – –a program that controls a display for the user (usually on a computer monitor) and.
MCTS Guide to Microsoft Windows 7
1 Performance Analysis with Vampir DKRZ Tutorial – 7 August, Hamburg Matthias Weber, Frank Winkler, Andreas Knüpfer ZIH, Technische Universität.
MpiP Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of Florida.
Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Portions © Intel Corporation | Portions © Hewlett-Packard Corporation * Other brands and names may be claimed as the property of others.
Lecture 8. Profiling - for Performance Analysis - Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture &
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
Adventures in Mastering the Use of Performance Evaluation Tools Manuel Ríos Morales ICOM 5995 December 4, 2002.
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems CSCI-6140 – Computer Operating Systems David Goldschmidt, Ph.D.
Chapter 34 Java Technology for Active Web Documents methods used to provide continuous Web updates to browser – Server push – Active documents.
Deep Computing © 2008 IBM Corporation The IBM High Performance Computing Toolkit Advanced Computing Technology Center
Chapter 2 Operating System Overview
INVITATION TO COMPUTER SCIENCE, JAVA VERSION, THIRD EDITION Chapter 6: An Introduction to System Software and Virtual Machines.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview Part 2: History (continued)
Overview of CrayPat and Apprentice 2 Adam Leko UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
1 Performance Analysis with Vampir ZIH, Technische Universität Dresden.
Using parallel tools on the SDSC IBM DataStar DataStar Overview HPM Perf IPM VAMPIR TotalView.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Measuring Interactive Performance with VNCplay Nickolai Zeldovich, Ramesh Chandra Stanford University.
The Mach System Abraham Silberschatz, Peter Baer Galvin, Greg Gagne Presentation By: Agnimitra Roy.
Belgrade, 25 September 2014 George S. Markomanolis, Oriol Jorba, Kim Serradell Performance analysis Tools: a case study of NMMB on Marenostrum.
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
A record and replay mechanism using programmable network interface cards Laurent Lefèvre INRIA / LIP (UMR CNRS, INRIA, ENS, UCB)
NUG Meeting Performance Profiling Using hpmcount, poe+ & libhpm Richard Gerber NERSC User Services
Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida.
Overview of AIMS Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green:
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
Performance Testing Test Complete. Performance testing and its sub categories Performance testing is performed, to determine how fast some aspect of a.
IBM ATS Deep Computing © 2007 IBM Corporation High Performance IO HPC Workshop – University of Kentucky May 9, 2007 – May 10, 2007 Andrew Komornicki, Ph.
COMPUTER III. Fundamental Concepts of Programming Control Structures Sequence Selection Iteration Flowchart Construction Introduction to Visual Basic.
Performance Analysis on Blue Gene/P Tulin Kaman Department of Applied Mathematics and Statistics Stony Brook University.
Performane Analyzer Performance Analysis and Visualization of Large-Scale Uintah Simulations Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance.
Other Tools HPC Code Development Tools July 29, 2010 Sue Kelly Sandia is a multiprogram laboratory operated by Sandia Corporation, a.
Operating Systems: Summary INF1060: Introduction to Operating Systems and Data Communication.
A Dynamic Tracing Mechanism For Performance Analysis of OpenMP Applications - Caubet, Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001) - Presented by Anita.
An operating system (OS) is a collection of system programs that together control the operation of a computer system.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Introduction to HPC Debugging with Allinea DDT Nick Forrington
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
Introduction to Operating Systems Concepts
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Module 12: I/O Systems I/O hardware Application I/O Interface
Chapter 13: I/O Systems Modified by Dr. Neerja Mhaskar for CS 3SH3.
Operating System Concepts
CS703 - Advanced Operating Systems
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Module 12: I/O Systems I/O hardwared Application I/O Interface
Presentation transcript:

Advanced Computing Technology Center © 2005 IBM Corporation The IBM High Performance Computing Toolkit Guojing Cong

Advanced Computing Technology Center © 2005 IBM Corporation IBM High Performance Computing Toolkit (HPCT)  One consolidated package  Components: –Hardware Performance Monitor(HPM) –Simulation Guided Memory Analyzer (SiGMA) –MPI Profiler (MP_profiler) –OpenMP Profiler (PompProf) –Modular I/O Performance Tool (MIO) –Xprofiler –GUI integration tool w/ source code traceback (PeekPerf) –Watson Sparse Matrix Library (WSMP) included

Advanced Computing Technology Center © 2005 IBM Corporation Our Vision  A toolkit that spans various aspects of high performance computing –CPU profiling, memory behavior analysis, communication profiling, I/O analysis and optimization  Integrated performance monitoring and profiling environment – one single consistent interface for all components – enhanced functionality Binary instrumentation (without source code modification) Dynamic instrumentation  Available on IBM Platforms –AIX, LoP, and BlueGene

Advanced Computing Technology Center © 2005 IBM Corporation Support Matrix HPMCount & HPMlib MP- profiler& MP-tracer Xprofiler SHMEM & SHMEM- profiler MIO PompPofi ler SiGMA PeekPerf Watson Sparse Matrix Package AIX Powe r today (AIX 5L 5.1, 5.3) today (AIX ) today (AIX 5L 5.1) today (AIX ) today (AIX 5L 5.1) Linux Powe r Aug/05 (Linux 2.4 &2.6) May/05 (Linux 2.6) Aug-Sep/05 (Linux 2.6) N/A TBT (Linux 2.6) N/A Aug-Sep/05 (Linux 2.6) TBTTBT(Linux 2.6) Linux JS20 Aug/05 (Linux 2.4 &2.6) May/05 (Linux 2.6) Aug-Sep/05 (Linux 2.6) N/A TBT (Linux 2.6) N/A Aug-Sep/05 (Linux 2.6) TBTTBT(Linux 2.6) Linux BG/L Aug/05todayAug/05N/ATBTN/A todayN/A

Advanced Computing Technology Center © 2005 IBM Corporation Outline  Xprofiler  HPM  MP Profiler  OpenMP Profiler  MIO

Advanced Computing Technology Center © 2005 IBM Corporation Xprofiler  CPU profiling tool similar to gprof  Can be used to profile both serial and parallel applications  Use procedure-profiling information to construct a graphical display of the functions within an application  Provide quick access to the profiled data and helps users identify functions that are the most CPU-intensive  Based on sampling (support from both compiler and kernel)  Charge execution time to source lines and show disassembly code

Advanced Computing Technology Center © 2005 IBM Corporation Xprofiler: Main Display  Width of a bar: time including called routines  Height of a bar: time excluding called routines  Call arrows labeled with number of calls  Overview window for easy navigation (View  Overview)

Advanced Computing Technology Center © 2005 IBM Corporation Xprofiler: Source Code Window  Source code window displays source code with time profile (in ticks=.01 sec)  Access –Select function in main display –  context menu –Select function in flat profile –  Code Display –  Show Source Code

Advanced Computing Technology Center © 2005 IBM Corporation Xprofiler - Disassembler Code

Advanced Computing Technology Center © 2005 IBM Corporation HPM  provides comprehensive reports of hardware events that are critical to performance –Accurate and Low overhead –Comprehensive E.g., number of floating-point instructions executed, cache misses, TLB misses  Derived metrics –correlate the behavior of the application to one or more of the hardware components  Thread-level support  Including –Hpmcount, libhpm, hpmstat

Advanced Computing Technology Center © 2005 IBM Corporation HPM Visualization Using PeekPerf

Advanced Computing Technology Center © 2005 IBM Corporation MP_profiler  A set of libraries that collect profiling data for MPI and TurboSHMEM applications –Implements wrappers using PMPI interface  Report performance metrics, e.g., –time used by MPI function calls –message sizes  Visualization tools help users identify performance bottlenecks –peekperf maps performance metrics back to the source codes –peekview gives a visual representation of the overall computation and communication pattern of the system.

Advanced Computing Technology Center © 2005 IBM Corporation MP_Profiler Visualization Using PeekPerf

Advanced Computing Technology Center © 2005 IBM Corporation MP_Tracer Visualization Using PeekPerf

Advanced Computing Technology Center © 2005 IBM Corporation POMP Profiler (PompProf)  Generates a detailed profile describing overheads and time spent by each thread in three key regions of the parallel application: –Parallel regions –OpenMP loops inside a parallel region –User defined functions  Profile data is presented in the form of an XML file that can be visualized with PeekPerf

Advanced Computing Technology Center © 2005 IBM Corporation DPOMP  Dynamically instruments OpenMP applications  Has the advantage of the being able to modify binaries with performance instrumentation without requiring access to souce codes or recompilation  Based on dynamic probes using DPCL

Advanced Computing Technology Center © 2005 IBM Corporation PompProf Visualization Using PeekPerf

Advanced Computing Technology Center © 2005 IBM Corporation Modular I/O Performance Tool (MIO)  I/O Analysis –Trace module –Summary of File I/O Activity + Binary Events File –Low CPU overhead  I/O Performance Enhancement Library –Prefetch module (optimizes asynchronous prefetch and write-behind) –System Buffer Bypass capability –User controlled pages (size and number)  Recoverable Error Handling –Recover module (monitors return values and errnor + reissues failed requests)  Remote Data Server –Remote module (simple socket protocol for moving data)  Shared object library for AIX

Advanced Computing Technology Center © 2005 IBM Corporation MIO User Code Interface #define open64(a,b,c)MIO_open64(a,b,c,0) #define readMIO_read #define writeMIO_write #define closeMIO_close #define lseek64MIO_lseek64 #define fcntlMIO_fcntl #define ftruncate64MIO_ftruncate64 #define fstat64MIO_fstat64

Advanced Computing Technology Center © 2005 IBM Corporation MIO Trace Module (sample partial text output) Trace close : program pf : /bmwfs/cdh108.T20536_13.SCR300 : (281946/ )= mbytes/s current size=0 max_size=16277 mode =0777 sector size=4096 oflags =0x302=RDWR CREAT TRUNC open write read seek fcntl trunc close size

Advanced Computing Technology Center © 2005 IBM Corporation MSC.Nastran V2001 Benchmark: SOL 111, 1.7M DOF, 1578 modes, 146 frequencies, residual flexibility and acoustics. 120 GB of disk space. Machine: 4-way, 1.3 GHz p655, 32 GB with 16 GB large pages, JFS striped on 16 SCSI disks. MSC.Nastran: V with large pages, dmp=2 parallel=2 mem=700mb The run with MIO used mio=1000mb Time (seconds) 6.8 TB of I/O in seconds is an average of about 250 MB/sec 0 10,000 20,000 30,000 40,000 50,000 60,000 no MIOwith MIO Elapsed CPU time

Advanced Computing Technology Center © 2005 IBM Corporation

Advanced Computing Technology Center © 2005 IBM Corporation

Advanced Computing Technology Center © 2005 IBM Corporation Problems that we are considering  Performance profiling and monitoring for scientific applications on large systems –Selectively generates and reports profiling data –Large amount performance data management and analysis  Composite profiling and presentation –CPU profiling –Hardware Performance Counter profiling –Communication profiling