Debugging, benchmarking, tuning i.e. software development tools Martin Čuma Center for High Performance Computing University of Utah

Slides:



Advertisements
Similar presentations
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Advertisements

Introduction to the CUDA Platform
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Module 7: Advanced Development  GEM only slides here  Started on page 38 in SC09 version Module 77-0.
Profiling your application with Intel VTune at NERSC
Intel® performance analyze tools Nikita Panov Idrisov Renat.
Matt Wolfe LC Development Environment Group Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
DEV-13: You've Got a Problem, Here’s How to Find It
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
Parallel/Concurrent Programming on the SGI Altix Conley Read January 25, 2007 UC Riverside, Department of Computer Science.
Contemporary Languages in Parallel Computing Raymond Hummel.
HPCC Mid-Morning Break Interactive High Performance Computing Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
1 1 Profiling & Optimization David Geldreich (DREAM)
ORIGINAL AUTHOR JAMES REINDERS, INTEL PRESENTED BY ADITYA AMBARDEKAR Overview for Intel Xeon Processors and Intel Xeon Phi coprocessors.
Debugging Cluster Programs using symbolic debuggers.
Lecture 8: Caffe - CPU Optimization
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
Statistical Performance Analysis for Scientific Applications Presentation at the XSEDE14 Conference Atlanta, GA Fei Xing Haihang You Charng-Da Lu July.
Welcome to the Power of 64-bit Computing …now available on your desktop! © 1998, 1999 Compaq Computer Corporation.
Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.
Introduction to the HPCC Jim Leikert System Administrator High Performance Computing Center.
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15.
Introduction to the HPCC Dirk Colbry Research Specialist Institute for Cyber Enabled Research.
Parallel Interactive Computing with PyTrilinos and IPython Bill Spotz, SNL (Brian Granger, Tech-X Corporation) November 8, 2007 Trilinos Users Group Meeting.
Debugging and Profiling GMAO Models with Allinea’s DDT/MAP Georgios Britzolakis April 30, 2015.
Old Chapter 10: Programming Tools A Developer’s Candy Store.
TotalView Debugging Tool Presentation Josip Jakić
DDT Debugging Techniques Carlos Rosales Scaling to Petascale 2010 July 7, 2010.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
GPU Architecture and Programming
Profiling, Tracing, Debugging and Monitoring Frameworks Sathish Vadhiyar Courtesy: Dr. Shirley Moore (University of Tennessee)
National Taiwan University Department of Computer Science and Information Engineering National Taiwan University Department of Computer Science and Information.
Debugging and Profiling With some help from Software Carpentry resources.
CSC 7600 Lecture 28 : Final Exam Review Spring 2010 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS FINAL EXAM REVIEW Daniel Kogler, Chirag Dekate.
Software Overview Environment, libraries, debuggers, programming tools and applications Jonathan Carter NUG Training 3 Oct 2005.
A New Parallel Debugger for Franklin: DDT Katie Antypas User Services Group NERSC User Group Meeting September 17, 2007.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
Application Debugging. Debugging methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece of electronic.
Msdevcon.ru#msdevcon. ИЗ ПЕРВЫХ РУК: ДИАГНОСТИКА ПРИЛОЖЕНИЙ С ПОМОЩЮ ИНСТРУМЕНТОВ VISUAL STUDIO 2012 MAXIM GOLDIN Senior Developer, Microsoft.
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Mitglied der Helmholtz-Gemeinschaft Debugging and Validation Tools on Parallel Systems 2012 |Bernd Mohr Institute for Advanced Simulation (IAS) Jülich.
1 How to do Multithreading First step: Sampling and Hotspot hunting Myongji University Sugwon Hong 1.
© 2002 IBM Corporation Confidential | Date | Other Information, if necessary Copyright © 2009 Ericsson, Made available under the Eclipse Public License.
Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Introduction to HPC Debugging with Allinea DDT Nick Forrington
NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.
QNX® Momentics® Development Suite Tools for Building, Debugging and Optimizing Embedded Systems.
HP-SEE TotalView Debugger Josip Jakić Scientific Computing Laboratory Institute of Physics Belgrade The HP-SEE initiative.
PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.
OpenMP Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
USEIMPROVEEVANGELIZE ● Yue Chao ● Sun Campus Ambassador High-Performance Computing with Sun Studio 12.
Debugging, benchmarking, tuning i.e. software development tools
HPC usage and software packages
CSE 374 Programming Concepts & Tools
Performance Analysis and optimization of parallel applications
A (very brief) intro to Eclipse
Chapter 2: System Structures
TAU integration with Score-P
CRESCO Project: Salvatore Raia
Java programming lecture one
HP C/C++ Remote developer plug-in for Eclipse
Productivity Tools for Scientific Computing
Chapter 2: Operating-System Structures
Intel® Parallel Studio and Advisor
Makefiles, GDB, Valgrind
Presentation transcript:

Debugging, benchmarking, tuning i.e. software development tools Martin Čuma Center for High Performance Computing University of Utah

10/1/2016http:// 2 SW development tools Development environments Compilers Version control Debuggers Profilers Runtime monitoring Benchmarking

PROGRAMMING TOOLS 10/1/2016http:// 3

10/1/2016http:// 4 Program editing Text editors –vim, emacs, atom IDEs –Visual *, Eclipse

10/1/2016http:// 5 Compilers Open source –GNU –Open64, clang Commercial –Intel –Portland Group (PGI, owned by Nvidia) –Vendors (IBM XL, Cray) –Others (Absoft, CAPS, PathScale)

10/1/2016http:// 6 Language support Languages –C/C++ - GNU, Intel, PGI –Fortran – GNU, Intel, PGI Interpreters –Matlab – has its own ecosystem –Java – reasonable ecosystem, not so popular in HPC, popular in HTC –Python – attempts to have its own ecosystem, some tools can plug into Python (e.g. Intel VTune)

10/1/2016http:// 7 Language/library support Language extensions –OpenMP (4.0+*) – GNU, Intel*, PGI –OpenACC – PGI, GNU very experimental –CUDA – Nvidia GCC, PGI Fortran Libraries –Intel Math Kernel Library (MKL) –PGI packages open source (OpenBLAS?).

Version control Copies of programs –Good enough for simple code and quick tests/changes Version control software –Allow code merging, branching, etc –Essential for collaborative development –RCS, CVS, SVN –Git – integrated web services, free for open source, can run own server for private code 10/1/2016http:// 8

DEBUGGING 10/1/2016http:// 9

10/1/2016http:// 10 Program errors Crashes –Segmentation faults (bad memory access) often writes core file – snapshot of memory at the time of the crash –Wrong I/O (missing files) –Hardware failures Incorrect results –Reasonable but incorrect results –NaNs – not a numbers – division by 0, …

10/1/2016http:// 11 write/printf Write variables of interest into the stdout or file Simplest but cumbersome –Need to recompile and rerun –Need to browse through potentially large output

10/1/2016http:// 12 Terminal debuggers Text only, e.g. gdb, idb Need to remember commands or their abbreviations Need to know lines in the code (or have it opened in other window) Useful for quick code checking on compute nodes and core dump analysis

10/1/2016http:// 13 GUI debuggers Have graphical user interface Some free, mostly commercial Eclipse CDT (C/C++ Development Tooling), PTP (Parallel Tools Platform) - free PGI’s pdbg – part of PGI compiler suite Intel development tools Rogue Wave Totalview - commercial Allinea DDT - commercial

10/1/2016http:// 14 Totalview and DDT The only real alternative for parallel or accelerator debugging Cost a lot of money (thousands of $), but, worth it We have Totalview license (for historical reasons), 32 tokens enough for our needs (renewal ~$1500/yr). XSEDE systems have DDT.

10/1/2016http:// 15 How to use Totalview 1. Compile binary with debugging information  flag -g gcc –g test.f –o test 2. Load module and run Totalview module load totalview  TV + executable totalview executable  TV + core file totalview executable core_file  Run TV and choose what to debug in a startup dialog totalview

10/1/2016http:// 16 Totalview windows

10/1/2016http:// 17 DDT screenshot

10/1/2016http:// 18 Debugger basic operations Data examination  view data in the variable windows  change the values of variables  modify display of the variables  visualize data Action points breakpoints and barriers (static or conditional) watchpoints evaluation of expressions

10/1/2016http:// 19 Multiprocess debugging Automatic attachment of child processes Create process groups Share breakpoints among processes Process barrier breakpoints Process group single-stepping View variables across procs/threads Display MPI message queue state

10/1/2016http:// 20 Additional Totalview tools Memoryscape –Dynamic memory debugging tool Replay Engine –Allows to reversely debug the code Accelerator debugging –CUDA and OpenACC

10/1/2016http:// 21 Code checkers Compilers check for syntax errors –lint based tools –Runtime checks through compiler flags (-fbounds-check, -check*, -Mbounds) DDT has a built in syntax checker –Matlab does too Memory checking tools - many errors are due to bad memory management –valgrind – easy to use, many false positives –Intel Inspector – intuitive GUI

10/1/2016http:// 22 Intel software development products We have a 2 concurrent user license –One license locks all the tools –Cost ~$2000/year Tools for all stages of development –Compilers and libraries –Verification tools –Profilers More info

10/1/2016http:// 23 Intel Inspector Thread checking –Data races and deadlocks Memory checker –Like leaks or corruption –Good alternative to Totalview MemoryScape Standalone or GUI integration More info

10/1/2016http:// 24 Intel Inspector Source the environment module load inspectorxe Compile with –tcheck -g ifort -openmp -tcheck -g trap.f Run tcheck inspxe-gui – graphical user interface inspxe-cl – command line Tutorial

10/1/2016http:// 25 Intel Trace Analyzer and Collector MPI profiler and correctness checker Detects violations of MPI standard and errors in execution environment To use correctness checker module load intel impi itac setenv VT_CHECK_TRACING 0 mpirun –check-mpi –n 4./myApp ITAC documentation support/documentation

PROFILING 10/1/2016http:// 26

10/1/2016http:// 27 Why to profile Evaluate performance Find the performance bottlenecks –Inefficient programming –Memory or I/O bottlenecks –Parallel scaling

Program runtime Time program runtime –get an idea on time to run and parallel scaling Many programs include benchmark problems –Some also accessible via “make test” Consider scripts, especially if doing parallel performance evaluation 10/1/2016http:// 28

10/1/2016http:// 29 Profiling categories Hardware counters –count events from CPU perspective (# of flops, memory loads, etc) –usually need Linux kernel module installed (> has it) Statistical profilers (sampling) –interrupt program at given intervals to find what routine/line the program is in Event based profilers (tracing) –collect information on each function call

Hardware counters CPUs include counters to count important events –Flops, instructions, cache/memory access –Access through kernel or PAPI (Performance Application Programming Interface) Tools to analyze the counters –perf - hardware counter collection, part of Linux –oprofile – profiler + hw counters –Intel VTune Drawback – harder to analyze the profiling results (exc. VTune) 10/1/2016http:// 30

Serial profiling Discover inefficient programming Computer architecture slowdowns Compiler optimizations evaluation gprof Compiler vendor supplied (e.g. pgprof, nvvp) Intel tools on serial programs –AdvisorXE, VTune 10/1/2016http:// 31

HPC open source tools HPC Toolkit –A few years old, did not find it as straightforward to use TAU (Tuning and Analysis Utilities) –Lots of features, which makes the learning curve slow Scalasca –Developed by European consortium, did not try yet 10/1/2016http:// 32

Intel tools Intel Parallel Studio XE 2016 Cluster Edition –Compilers (C/C++, Fortran) –Math library (MKL) –Threading library (TBB) –Thread design and prototype (Advisor) –Memory and thread debugging (Inspector) –Profiler (VTune Amplifier) –MPI library (Intel MPI) –MPI analyzer and profiler (ITAC) 10/1/2016http:// 33

10/1/2016http:// 34 Intel VTune Amplifier Serial and parallel profiler –Multicore support for OpenMP and OpenCL on CPUs, GPUs and Xeon Phi Quick identification of performance bottlenecks –Various analyses and points of view in the GUI –Makes choice of analysis and results inspection easier GUI and command line use More info

10/1/2016http:// 35 Intel VTune Amplifier Source the environment module load vtune Run VTune amplxe-gui – GUI amplxe-cl – CLI Can be used also for remote profiling (e.g. on Xeon Phi) Tuning guides for specific architectures performance-analysis-papers

10/1/2016http:// 36 Intel Advisor Vectorization advisor –Identify loops that benefit from vectorization, what is blocking efficient vectorization and explore benefit of data reorganization Thread design and prototyping –Analyze, design, tune and check threading design without disrupting normal development More info

10/1/2016http:// 37 Intel Advisor Source the environment module load advisorxe Run Advisor advixe-gui – GUI advixe-cl – CLI Create project and choose appropriate modeling Getting started guide

10/1/2016http:// 38 Intel Trace Analyzer and Collector MPI profiler –traces MPI code –identifies communication inefficiencies Collector collects the data and Analyzer visualizes them More info

10/1/2016http:// 39 Intel TAC Source the environment module load itac Using Intel compilers, can compile with –trace mpiifort -openmp –trace trap.f Run MPI code mpirun –trace –n 4./a.out Run visualizer traceanalyzer a.out.stf & Getting started guide

RUNTIME MONITORING 10/1/2016http:// 40

10/1/2016http:// 41 Why runtime monitoring? Make sure program is running right –Hardware problems –Correct parallel mapping / process affinity Careful about overhead

10/1/2016http:// 42 Runtime monitoring Self checking –ssh to node(s), run “top”, or look at “sar” logs –SLURM (or other scheduler) logs and statistics Tools –XDMoD – XSEDE Metrics on Demand (through SUPReMM module) –REMORA - REsource MOnitoring for Remote Applications

BENCHMARKING 10/1/2016http:// 43

10/1/2016http:// 44 Why to benchmark? Evaluate system’s performance –Testing new hardware Verify correct hardware and software installation –New cluster/node deployment There are tools for cluster checking (Intel Cluster Checker, cluster distros, …) –Checking newly built programs Sometimes we leave this to the users

10/1/2016http:// 45 New system evaluation Simple synthetic benchmarks –FLOPS, STREAM Synthetic benchmarks –HPL – High Performance Linpack – dense linear algebra problems – cache friendly –HPCC – HPC Challenge Benchmark – collection of dense, sparse and other (FFT) benchmarks –NPB – NAS Parallel Benchmarks – mesh based solvers – OpenMP, MPI, OpenACC implementations

10/1/2016http:// 46 New system evaluation Real applications benchmarks –Depend on local usage –Gaussian, VASP –Amber, LAMMPS, NAMD, Gromacs –ANSYS, Abaqus, StarCCM+ –Own codes Script if possible –A lot of combinations of test cases vs. number of MPI tasks/OpenMP cores

10/1/2016http:// 47 Cluster deployment Whole cluster –Some vendors have cluster verification tools –We have a set of scripts that run basic checks and HPL at the end New cluster nodes –Verify received hardware configuration, then rack –Basic system tests (node health check) –HPL – get expected performance per node (CPU or memory issues), or across more nodes (network issues)

BACKUP 10/1/2016http:// 48

10/1/2016http:// 49 Demos Totalview Advisor Inspector VTune