Debugging, benchmarking, tuning i.e. software development tools

Slides:



Advertisements
Similar presentations
Issues of HPC software From the experience of TH-1A Lu Yutong NUDT.
Advertisements

Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Introduction to the CUDA Platform
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Profiling your application with Intel VTune at NERSC
Intel® performance analyze tools Nikita Panov Idrisov Renat.
Matt Wolfe LC Development Environment Group Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
Parallel/Concurrent Programming on the SGI Altix Conley Read January 25, 2007 UC Riverside, Department of Computer Science.
Contemporary Languages in Parallel Computing Raymond Hummel.
HPCC Mid-Morning Break Interactive High Performance Computing Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
1 1 Profiling & Optimization David Geldreich (DREAM)
ORIGINAL AUTHOR JAMES REINDERS, INTEL PRESENTED BY ADITYA AMBARDEKAR Overview for Intel Xeon Processors and Intel Xeon Phi coprocessors.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Debugging Cluster Programs using symbolic debuggers.
Lecture 8: Caffe - CPU Optimization
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
Welcome to the Power of 64-bit Computing …now available on your desktop! © 1998, 1999 Compaq Computer Corporation.
Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15.
Debugging and Profiling GMAO Models with Allinea’s DDT/MAP Georgios Britzolakis April 30, 2015.
TotalView Debugging Tool Presentation Josip Jakić
DDT Debugging Techniques Carlos Rosales Scaling to Petascale 2010 July 7, 2010.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
GPU Architecture and Programming
Profiling, Tracing, Debugging and Monitoring Frameworks Sathish Vadhiyar Courtesy: Dr. Shirley Moore (University of Tennessee)
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
National Taiwan University Department of Computer Science and Information Engineering National Taiwan University Department of Computer Science and Information.
Debugging and Profiling With some help from Software Carpentry resources.
Chapter 5 Information Systems in Business Software
Software Overview Environment, libraries, debuggers, programming tools and applications Jonathan Carter NUG Training 3 Oct 2005.
1 The Portland Group, Inc. Brent Leback HPC User Forum, Broomfield, CO September 2009.
A New Parallel Debugger for Franklin: DDT Katie Antypas User Services Group NERSC User Group Meeting September 17, 2007.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
Application Debugging. Debugging methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece of electronic.
Msdevcon.ru#msdevcon. ИЗ ПЕРВЫХ РУК: ДИАГНОСТИКА ПРИЛОЖЕНИЙ С ПОМОЩЮ ИНСТРУМЕНТОВ VISUAL STUDIO 2012 MAXIM GOLDIN Senior Developer, Microsoft.
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Single Node Optimization Computational Astrophysics.
© 2002 IBM Corporation Confidential | Date | Other Information, if necessary Copyright © 2009 Ericsson, Made available under the Eclipse Public License.
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Introduction to HPC Debugging with Allinea DDT Nick Forrington
Visual Programming Borland Delphi. Developing Applications Borland Delphi is an object-oriented, visual programming environment to develop 32-bit applications.
NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.
QNX® Momentics® Development Suite Tools for Building, Debugging and Optimizing Embedded Systems.
HP-SEE TotalView Debugger Josip Jakić Scientific Computing Laboratory Institute of Physics Belgrade The HP-SEE initiative.
PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.
OpenMP Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
USEIMPROVEEVANGELIZE ● Yue Chao ● Sun Campus Ambassador High-Performance Computing with Sun Studio 12.
Debugging, benchmarking, tuning i.e. software development tools Martin Čuma Center for High Performance Computing University of Utah
HPC usage and software packages
Open Source Software Development Environment
Computer Organization, Eclipse Intro
CSE 374 Programming Concepts & Tools
Performance Analysis and optimization of parallel applications
A (very brief) intro to Eclipse
Chapter 2: System Structures
TAU integration with Score-P
CRESCO Project: Salvatore Raia
NVIDIA Profiler’s Guide
HP C/C++ Remote developer plug-in for Eclipse
Productivity Tools for Scientific Computing
Intel® Parallel Studio and Advisor
F II 1. Background Objectives
Makefiles, GDB, Valgrind
Presentation transcript:

Debugging, benchmarking, tuning i.e. software development tools Martin Čuma Center for High Performance Computing University of Utah m.cuma@utah.edu

SW development tools Development environments Compilers Version control Debuggers Profilers Runtime monitoring Benchmarking 4/25/2018 http://www.chpc.utah.edu

Programming tools 4/25/2018 http://www.chpc.utah.edu

Program editing IDEs Text editors Visual *, Eclipse vim, emacs, atom 4/25/2018 http://www.chpc.utah.edu

Compilers Open source Commercial GNU Open64, clang, Mono Intel Portland Group (PGI, owned by Nvidia) Vendors (IBM XL, Cray) Others (Absoft, CAPS, Lahey) 4/25/2018 http://www.chpc.utah.edu

Language support Languages Interpreters C/C++ - GNU, Intel, PGI Fortran – GNU, Intel, PGI Interpreters Matlab – has its own ecosystem (debugger, profiler) Java – reasonable ecosystem, not so popular in HPC, popular in HTC Python – ecosystem improving, some tools can plug into Python (e.g. Intel VTune) 4/25/2018 http://www.chpc.utah.edu

Language/library support Language extensions OpenMP (4.0+*) – GNU, Intel*, PGI OpenACC – PGI, GNU very experimental CUDA – Nvidia GCC, PGI Fortran Libraries Intel Math Kernel Library (MKL) PGI packages open source (OpenBLAS?). 4/25/2018 http://www.chpc.utah.edu

Version control Copies of programs Version control software http://www.chpc.utah.edu 4/25/2018 Version control Copies of programs Good enough for simple code and quick tests/changes Version control software Allow code merging, branching, etc Essential for collaborative development RCS, CVS, SVN Git – integrated web services, free for open source, can run own server for private code (gitlab) 4/25/2018 http://www.chpc.utah.edu

Debugging 4/25/2018 http://www.chpc.utah.edu

Program errors Crashes Incorrect results Segmentation faults (bad memory access) often writes core file – snapshot of memory at the time of the crash Wrong I/O (missing files) Hardware failures Incorrect results Reasonable but incorrect results NaNs – not a numbers – division by 0, … 4/25/2018 http://www.chpc.utah.edu

write/printf Write variables of interest into the stdout or file Simplest but cumbersome Need to recompile and rerun Need to browse through potentially large output 4/25/2018 http://www.chpc.utah.edu

Terminal debuggers Text only, e.g. gdb, idb Need to remember commands or their abbreviations Need to know lines in the code (or have it opened in other window) Useful for quick code checking on compute nodes and core dump analysis 4/25/2018 http://www.chpc.utah.edu

GUI debuggers Have graphical user interface Some free, mostly commercial Eclipse CDT (C/C++ Development Tooling), PTP (Parallel Tools Platform) - free PGI’s pdbg – part of PGI compiler suite Intel development tools Rogue Wave Totalview - commercial Allinea DDT - commercial 4/25/2018 http://www.chpc.utah.edu

Totalview and DDT The only real alternative for parallel or accelerator debugging Cost a lot of money (thousands of $), but, worth it We had Totalview license (for historical reasons), 32 tokens enough for our needs (renewal ~$1500/yr) In 2017 we switched to DDT which gave us competitive upgrade XSEDE systems have DDT 4/25/2018 http://www.chpc.utah.edu

How to use Totalview/DDT 1. Compile binary with debugging information flag -g gcc –g test.f –o test 2. Load module and run Totalview or DDT module load totalview ; module load ddt TV/DDT + executable totalview executable ; ddt executable TV/DDT + core file totalview executable core_file ; ddt executable corefile Run TV/DDT and choose what to debug in a startup dialog totalview ; ddt also, there are numerous flags that enable TV customized startup – refer to the user’s manual 4/25/2018 http://www.chpc.utah.edu

Totalview windows 4/25/2018 http://www.chpc.utah.edu root – displays what processes are running + access to global menu items process – most commonly used window – go over it in more detail later variable – inspection and modification of variables 4/25/2018 http://www.chpc.utah.edu

DDT screenshot 4/25/2018 http://www.chpc.utah.edu

Debugger basic operations Data examination view data in the variable windows change the values of variables modify display of the variables visualize data Action points breakpoints and barriers (static or conditional) watchpoints evaluation of expressions 4/25/2018 http://www.chpc.utah.edu

Multiprocess debugging Automatic attachment of child processes Create process groups Share breakpoints among processes Process barrier breakpoints Process group single-stepping View variables across procs/threads Display MPI message queue state 4/25/2018 http://www.chpc.utah.edu

Code checkers Compilers check for syntax errors lint based tools Runtime checks through compiler flags (-fbounds-check, -check*, -Mbounds) DDT has a built in syntax checker Matlab does too Memory checking tools - many errors are due to bad memory management valgrind – easy to use, many false positives Intel Inspector – intuitive GUI 4/25/2018 http://www.chpc.utah.edu

Intel software development products We have a 2 concurrent user license + 2 just for compilers One license locks all the tools Cost ~$4000/year + ~$1000 for the compilers Free for students, open source developers, educators Tools for all stages of development Compilers and libraries Verification tools Profilers More info - https://software.intel.com/en-us/intel-parallel-studio-xe 4/25/2018 http://www.chpc.utah.edu

Intel Inspector Thread checking Memory checker Data races and deadlocks Memory checker Like leaks or corruption Good alternative to Totalview or DDT Standalone or GUI integration More info http://software.intel.com/en-us/intel-inspector-xe/ 4/25/2018 http://www.chpc.utah.edu

Intel Inspector Source the environment Compile with –tcheck -g module load inspectorxe Compile with –tcheck -g ifort -openmp -tcheck -g trap.f Run tcheck inspxe-gui – graphical user interface inspxe-cl – command line Tutorial https://software.intel.com/en-us/articles/inspectorxe-tutorials 4/25/2018 http://www.chpc.utah.edu

Intel Trace Analyzer and Collector MPI profiler and correctness checker Detects violations of MPI standard and errors in execution environment To use correctness checker module load intel impi itac setenv VT_CHECK_TRACING 0 mpirun –check-mpi –n 4 ./myApp ITAC documentation https://software.intel.com/en-us/intel-trace-analyzer-support/documentation 4/25/2018 http://www.chpc.utah.edu

Profiling 4/25/2018 http://www.chpc.utah.edu

Why to profile Evaluate performance Find the performance bottlenecks Inefficient programming Array data access, optimized functions, vectorization Memory or I/O bottlenecks Parallel scaling Inefficient parallel decomposition, communication 4/25/2018 http://www.chpc.utah.edu

Program runtime Time program runtime http://www.chpc.utah.edu 4/25/2018 Program runtime Time program runtime get an idea on time to run and parallel scaling Many programs include benchmark problems Some also accessible via “make test” Consider scripts, especially if doing parallel performance evaluation 4/25/2018 http://www.chpc.utah.edu

Profiling categories Hardware counters count events from CPU perspective (# of flops, memory loads, etc) usually need Linux kernel module installed (>2.6.31 has it) Statistical profilers (sampling) interrupt program at given intervals to find what routine/line the program is in Event based profilers (tracing) collect information on each function call 4/25/2018 http://www.chpc.utah.edu

Hardware counters CPUs include counters to count important events http://www.chpc.utah.edu 4/25/2018 Hardware counters CPUs include counters to count important events Flops, instructions, cache/memory access Access through kernel or PAPI (Performance Application Programming Interface) Tools to analyze the counters perf - hardware counter collection, part of Linux oprofile – profiler + hw counters Intel VTune Drawback – harder to analyze the profiling results (exc. VTune) 4/25/2018 http://www.chpc.utah.edu

Serial profiling Discover inefficient programming http://www.chpc.utah.edu 4/25/2018 Serial profiling Discover inefficient programming Computer architecture slowdowns Compiler optimizations evaluation gprof Compiler vendor supplied (e.g. pgprof, nvvp) Intel tools on serial programs AdvisorXE, VTune Pgprof profiles CPU and GPU Intel tools feature workflows for ease of use. 4/25/2018 http://www.chpc.utah.edu

HPC open source tools HPC Toolkit TAU (Tuning and Analysis Utilities) http://www.chpc.utah.edu 4/25/2018 HPC open source tools HPC Toolkit A few years old, did not find it as straightforward to use TAU (Tuning and Analysis Utilities) Lots of features, which makes the learning curve slow Score-P/Scalasca Developed by European consortium, did not try yet 4/25/2018 http://www.chpc.utah.edu

Intel tools Intel Parallel Studio XE 2017 Cluster Edition http://www.chpc.utah.edu 4/25/2018 Intel tools Intel Parallel Studio XE 2017 Cluster Edition Compilers (C/C++, Fortran) Math library (MKL) Threading library (TBB) Thread design and prototype (Advisor) Memory and thread debugging (Inspector) Profiler (VTune Amplifier) MPI library (Intel MPI) MPI analyzer and profiler (ITAC) 4/25/2018 http://www.chpc.utah.edu

Intel VTune Amplifier Serial and parallel profiler Multicore support for OpenMP and OpenCL on CPUs, GPUs and Xeon Phi Quick identification of performance bottlenecks Various analyses and points of view in the GUI Makes choice of analysis and results inspection easier GUI and command line use More info https://software.intel.com/en-us/intel-vtune-amplifier-xe 4/25/2018 http://www.chpc.utah.edu

Intel VTune Amplifier Source the environment Run VTune module load vtune Run VTune amplxe-gui – GUI amplxe-cl – CLI Can be used also for remote profiling (e.g. on Xeon Phi) Tuning guides for specific architectures https://software.intel.com/en-us/articles/processor-specific-performance-analysis-papers 4/25/2018 http://www.chpc.utah.edu

Intel Advisor Vectorization advisor Thread design and prototyping Identify loops that benefit from vectorization, find what is blocking efficient vectorization Useful for speeding up loop performance Thread design and prototyping Analyze, design, tune and check threading design Useful for implementing OpenMP in serial code More info http://software.intel.com/en-us/intel-advisor-xe/ 4/25/2018 http://www.chpc.utah.edu

Intel Advisor Source the environment Run Advisor module load advisorxe Run Advisor advixe-gui – GUI advixe-cl – CLI Create project and choose appropriate modeling Getting started guide https://software.intel.com/en-us/get-started-with-advisor 4/25/2018 http://www.chpc.utah.edu

Intel Trace Analyzer and Collector MPI profiler traces MPI code identifies communication inefficiencies Collector collects the data and Analyzer visualizes them More info https://software.intel.com/en-us/intel-trace-analyzer 4/25/2018 http://www.chpc.utah.edu

Intel TAC Source the environment module load itac Using Intel compilers, can compile with –trace mpiifort -openmp –trace trap.f Run MPI code mpirun –trace –n 4 ./a.out Run visualizer traceanalyzer a.out.stf & Getting started guide https://software.intel.com/en-us/get-started-with-itac-for-linux 4/25/2018 http://www.chpc.utah.edu

Runtime monitoring 4/25/2018 http://www.chpc.utah.edu

Why runtime monitoring? Make sure program is running right Hardware problems Correct parallel mapping / process affinity Careful about overhead 4/25/2018 http://www.chpc.utah.edu

Runtime monitoring Self checking Tools ssh to node(s), run “top”, or look at “sar” logs, “dmesg”, “taskset”, … SLURM (or other scheduler) logs and statistics LBNL’s Node Health Check (nhc) Tools XDMoD – XSEDE Metrics on Demand (through SUPReMM module) REMORA - REsource MOnitoring for Remote Applications 4/25/2018 http://www.chpc.utah.edu

BENCHMARKing 4/25/2018 http://www.chpc.utah.edu

Why to benchmark? Evaluate system’s performance Testing new hardware Verify correct hardware and software installation New cluster/node deployment There are tools for cluster checking (Intel Cluster Checker, cluster distros, …) Checking newly built programs Sometimes we leave this to the users 4/25/2018 http://www.chpc.utah.edu

New system evaluation Simple synthetic benchmarks Synthetic benchmarks FLOPS, STREAM Synthetic benchmarks HPL – High Performance Linpack – dense linear algebra problems – cache friendly HPCC – HPC Challenge Benchmark – collection of dense, sparse and other (FFT) benchmarks NPB – NAS Parallel Benchmarks – mesh based solvers – OpenMP, MPI, OpenACC implementations 4/25/2018 http://www.chpc.utah.edu

New system evaluation Real applications benchmarks Script if possible Depend on local usage Gaussian, VASP Amber, LAMMPS, NAMD, Gromacs ANSYS, Abaqus, StarCCM+ Own codes Script if possible A lot of combinations of test cases vs. number of MPI tasks/OpenMP cores 4/25/2018 http://www.chpc.utah.edu

Cluster deployment Whole cluster New cluster nodes Some vendors have cluster verification tools We have a set of scripts that run basic checks and HPL at the end New cluster nodes Verify received hardware configuration, then rack Basic system tests (node health check) HPL – get expected performance per node (CPU or memory issues), or across more nodes (network issues) 4/25/2018 http://www.chpc.utah.edu

What else do you do at your site? 4/25/2018 http://www.chpc.utah.edu

Backup 4/25/2018 http://www.chpc.utah.edu

Demos Totalview Advisor Inspector VTune 4/25/2018 http://www.chpc.utah.edu