Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany.

Slides:



Advertisements
Similar presentations
2008 EPA and Partners Metadata Training Program: 2008 CAP Project Geospatial Metadata: Intermediate Course Module 3: Metadata Catalogs and Geospatial One.
Advertisements

Marylands Technology Education Voluntary State Curriculum 2007 Bob Gray Center for the Teaching of Technology & Science (ITEA-CATTS) and the University.
Thursday, November 1, 2001(c) 2001 Ibrahim K. El-Far. All rights reserved.1 Enjoying the Perks of Model-based Testing Ibrahim K. El-Far Florida Institute.
Shared-Memory Model and Threads Intel Software College Introduction to Parallel Programming – Part 2.
Implementing Domain Decompositions Intel Software College Introduction to Parallel Programming – Part 3.
Large-Scale, Adaptive Fabric Configuration for Grid Computing Peter Toft HP Labs, Bristol June 2003 (v1.03) Localised for UK English.
§ 1.10 Properties of the Real Number System. Angel, Elementary Algebra, 7ed 2 Commutative Property Commutative Property of Addition If a and b represent.
Fakult ä t f ü r informatik informatik 12 technische universit ä t dortmund Data flow models Peter Marwedel TU Dortmund, Informatik 12 Graphics: © Alexandra.
Copyright © 2002 Pearson Education, Inc. Slide 1.
1 Copyright © 2002 Pearson Education, Inc.. 2 Chapter 1 Introduction to Perl and CGI.
Chapter 6 Structures and Classes. Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 6-2 Learning Objectives Structures Structure types Structures.
Broadband Session Michael Byrne. Broadband Map Technical Details Data Integration Map Presentation Since Launch.
Congestion Control and Fairness Models Nick Feamster CS 4251 Computer Networking II Spring 2008.
IST Humboldt-University, Berlin, Germany - Electronic Publishing Group - Computing Centre / University Library Susanne Dobratz, 28. March.
Forschungszentrum Jülich in der Helmholtz-Gesellschaft February 2007 – OGF19 A Collaborative Online Visualization and Steering (COVS) Framework for e-Science.
© 2007 Open Grid Forum SAGA: Simple API for Grid Applications Steven Newhouse Application Standards Area Director.
Page 1 © April 2004 Thorsten Fink, Ph.D., Wolfgang Metzner GmbH & Co KG© März 2004 Thorsten Fink, Ph.D., Wolfgang Metzner GmbH & Co KG To Boldly Go......
SPEC ENV2002: Environmental Simulation Benchmarks Wesley Jones
Höchstleistungsrechenzentrum Stuttgart Matthias M üller SPEC Benchmarks at HLRS Matthias Mueller High Performance Computing Center Stuttgart
Höchstleistungsrechenzentrum Stuttgart SEGL Parameter Study Slide 1 Science Experimental Grid Laboratory (SEGL) Dynamical Parameter Study in Distributed.
C&C Research Laboratories, NEC Europe Ltd. EC Workshop, Brussels, Delivery of industrial-strength Grid middleware: Establishing.
EIONET European Environment Information and Observation Network Version * * * Quality assurance of Eurowaternet.
Performance Analysis Tools for High-Performance Computing Daniel Becker
Making the System Operational
1 The European Agency for Special Needs Education Pete Walker Internet Development Group Manager Kieren Pitts Senior Analyst/Programmer.
NGS computation services: API's,
Grid Services and Microsoft.NET The MS.NETGrid Project Dr. Mike Jackson EPCC All Hands Meeting.
1 Computer Science, University of Warwick Distributed Shared Memory Distributed Shared Memory (DSM) Systems build the shared memory abstract on top of.
Fawaz Ghali Web 2.0 for the Adaptive Web.
Construction process lasts until coding and testing is completed consists of design and implementation reasons for this phase –analysis model is not sufficiently.
A Comparative Survey of Enterprise Architecture Frameworks
© CHAMELEON RF 2005Model Order Reduction - Eindhoven – April 10-12, 2006 Course on Model Order Reduction Eindhoven, April 10-12, 2006.
Parallel Computing Glib Dmytriiev
1 Tuning for MPI Protocols l Aggressive Eager l Rendezvous with sender push l Rendezvous with receiver pull l Rendezvous blocking (push or pull)
INTRODUCTION TO SIMULATION WITH OMNET++ José Daniel García Sánchez ARCOS Group – University Carlos III of Madrid.
1 CSC 221: Computer Programming I Fall 2006 interacting objects modular design: dot races constants, static fields cascading if-else, logical operators.
MPI Message Passing Interface
DRIS/BP Task Group Report, Madrid, Sergey Parinov, TG leader Barbara Ebert, deputy TG leader.
Fawaz Ghali AJAX: Web Programming's Toy.
1 The C Language An International Standard CIS2450 Professional Aspect of Software Engineering.
EC220 - Introduction to econometrics (chapter 3)
Μπ A Scalable & Transparent System for Simulating MPI Programs Kalyan S. Perumalla, Ph.D. Senior R&D Manager Oak Ridge National Laboratory Adjunct Professor.
Addition 1’s to 20.
XML Craig Stewart Dr. Alexandra I. Cristea
Excel Tutorials (MS Excel 2003)
Ch. 13: Supply Chain Performance Measurement: Introduction
INTEL CONFIDENTIAL Threading for Performance with Intel® Threading Building Blocks Session:
Parallel Processing with OpenMP
Practical techniques & Examples
A Coherent and Managed Runtime for ML on the SCC KC SivaramakrishnanLukasz Ziarek Suresh Jagannathan Purdue University SUNY Buffalo Purdue University.
Introductions to Parallel Programming Using OpenMP
Parallel Processing1 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
Types of Parallel Computers
Tools for Engineering Analysis of High Performance Parallel Programs David Culler, Frederick Wong, Alan Mainwaring Computer Science Division U.C.Berkeley.
Scaling Content Based Image Retrieval Systems Christine Lo, Sushant Shankar, Arun Vijayvergiya CS 267.
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.
Performance Technology for Complex Parallel Systems REFERENCES.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Unified Parallel C at LBNL/UCB FT Benchmark in UPC Christian Bell and Rajesh Nishtala.
D. Becker, M. Geimer, R. Rabenseifner, and F. Wolf Laboratory for Parallel Programming | September Synchronizing the timestamps of concurrent events.
Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum.
Presented by KOJAK and SCALASCA Jack Dongarra, Shirley Moore, Farzona Pulatova, and Fengguang Song University of Tennessee and Oak Ridge National Laboratory.
Compilation Technology SCINET compiler workshop | February 17-18, 2009 © 2009 IBM Corporation Software Group Coarray: a parallel extension to Fortran Jim.
S AN D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE On pearls and perils of hybrid OpenMP/MPI programming.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Bronis R. de Supinski and John May Center for Applied Scientific Computing March 18, 1999 Benchmarking pthreads.
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
OpenMP Runtime Extensions Many core Massively parallel environment Intel® Xeon Phi co-processor Blue Gene/Q MPI Internal Parallelism Optimizing MPI Implementation.
Presented by Jack Dongarra University of Tennessee and Oak Ridge National Laboratory KOJAK and SCALASCA.
Synchronizing Computations
Presentation transcript:

Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum Jülich John von Neumann - Institut für Computing Germany Jesper Larsson Träff NEC Europe Ltd. C&C Research Labs Germany Initial Design of a Test Suite for (Automatic) Performance Analysis Tools

© 2003 Forschungszentrum Jülich, NIC-ZAM [2] IST Working Group APART (since 1999) APART Automatic Performance Analysis: Resources and Tools Forum for scientists and vendors About 20 partners in Europe and the U.S. Current Automatic Performance Tools Projects Askalon Kappa-Pi KOJAK Paradyn Peridot

© 2003 Forschungszentrum Jülich, NIC-ZAM [3] (Full, Associated, and Former) Members European Research Centers and Universities U.S. Research Centers and Universities Vendors

© 2003 Forschungszentrum Jülich, NIC-ZAM [4] APART Terminologie Performance Property Aspect of performance behavior of an application –E.g., communication dominated by waiting time Specified as condition referring to performance data Quantified and normalized in terms of behavior-independent metric (severity) Performance Problem Performance property with “negative” implications Performance Bottleneck Performance Problem with highest severity

© 2003 Forschungszentrum Jülich, NIC-ZAM [5] Example: Performance Property “Message in Wrong Order” Location RECV A Time wait SEND B C RECV SEND

© 2003 Forschungszentrum Jülich, NIC-ZAM [6] The APART Test Suite (ATS) Users rely on correct working of tools  Tools need to be especially well tested  Systematic approach needed APART Test Suite Common project inside APART group –Every member needs this  minimize resources –Ensures re-usability –Will also allow evaluation / comparison of the different member projects Main focus: automatic performance analysis tools But also useful for “regular” performance tools –

© 2003 Forschungszentrum Jülich, NIC-ZAM [7] Desired Functionality Tests to determine whether the semantics of the original program were not altered Tests to see whether the recorded performance data is correct Synthetic positive test cases for each known and defined performance property and combinations of them Negative test cases which have no known performance problem “Real world” size parallel applications and benchmarks  Can be partially based on existing validation suites  WWW  Probably needs to be tool specific  Collect available benchmarks and applications  WWW  Design and Implementation of a ATS Framework

© 2003 Forschungszentrum Jülich, NIC-ZAM [8] Validation Suites and Kernel Benchmarks (I) Validation MPI test / validation suites from Intel, IBM, ANL MPI Benchmarks PARKBENCH (PARallel Kernels and BENCHmarks) / PMB - Pallas MPI Benchmarks SKaMPI (Special Karlsruher MPI – Benchmark)

© 2003 Forschungszentrum Jülich, NIC-ZAM [9] Kernel Benchmarks (II) OpenMP Benchmarks EPCC OpenMP Microbenchmarks … research/openmpbench/openmp_index.html Hybrid Benchmarks The Los Alamos MicroBenchmarks Suite (LAMB) MPI and multi threading ( Pthreads and OpenMP) programming models based on SKaMPI and EPCC

© 2003 Forschungszentrum Jülich, NIC-ZAM [10] “Real World” Applications and Benchmarks The NAS Parallel Benchmarks (NPB) The ASCI Purple and Blue Benchmark Codes … asci/purple/benchmarks/limited/code_list.html … asci_benchmarks/asci/asci_code_list.html NCAR Benchmarks

© 2003 Forschungszentrum Jülich, NIC-ZAM [11] Current Design of ATS Framework df_same() df_cyclic2() df_block2() df_linear() df_peak() df_cyclic3() df_block3() DISTRIBUTION do_work() WORK

© 2003 Forschungszentrum Jülich, NIC-ZAM [12] The Distribution Module Distribution specified by Distribution function Distribution parameters All distribution function have the same signature double distr_func (int me, int size, double sf, distr_t* dd) –me, size:member me of group of size size –sf:scaling factor –dd:distribution parameter descriptor returns value for me calculated based on me, size, and dd scaled by sf ATS provides set of predefined distribution functions Can easily extended if needed

© 2003 Forschungszentrum Jülich, NIC-ZAM [13] Predefined Distribution Functions low high block2 low high cyclic2 val same low high linear low high peak low med high block3 low med high cyclic3 n

© 2003 Forschungszentrum Jülich, NIC-ZAM [14] Current Design of ATS Framework df_same() df_cyclic2() df_block2() df_linear() df_peak() df_cyclic3() df_block3() DISTRIBUTION do_work() WORKMPI PROPERTIESOpenMP PROPERTIES par_do_omp_work() OpenMP UTILS par_do_mpi_work() alloc_mpi_buf() free_mpi_buf() alloc_mpi_vbuf() free_mpi_vbuf() mpi_commpattern_sendrecv() mpi_commpattern_shift() MPI UTILS

© 2003 Forschungszentrum Jülich, NIC-ZAM [15] Example: MPI Property Function late_sender void par_do_mpi_work(distr_func_t df, distr_t* dd, MPI_Comm c) { int me, sz; MPI_Comm_rank(c, &me); MPI_Comm_size(c, &sz); do_work(df(me, sz, 1.0, dd)); } void late_sender(double bwork, double ework, int r, MPI_Comm c) { val2_distr_t dd; int i; mpi_buf_t* buf = alloc_mpi_buf(base_type, base_cnt); dd.low = bwork+ework; dd.high = bwork; for (i = 0; i<r; ++i) { par_do_mpi_work(df_cyclic2, &dd, c); mpi_commpattern_sendrecv(buf, DIR_UP, 0, 0, c); } free_mpi_buf(buf); }

© 2003 Forschungszentrum Jülich, NIC-ZAM [16] Currently Implemented Performance Property Functions MPI Point-to-PoCommunication Performance Properties late_sender(basework, extrawork, rf, MPI_Comm); late_receiver(basework, extrawork, rf, MPI_Comm); MPI Collective Communication Performance Properties imbalance_at_mpi_barrier(distr_func, distr_param, rf, MPI_Comm); imbalance_at_mpi_alltoall(distr_func, distr_param, rf, MPI_Comm); late_broadcast(basework, rootextrawork, root, rf, MPI_Comm); late_scatter(basework, rootextrawork, root, rf, MPI_Comm); late_scatterv(basework, rootextrawork, root, rf, MPI_Comm); early_reduce(rootwork, baseextrawork, root, rf, MPI_Comm); early_gather(rootwork, baseextrawork, root, rf, MPI_Comm); early_gatherv(rootwork, baseextrawork, root, rf, MPI_Comm); OpenMP Performance Properties imbalance_in_parallel_region(distr_func, distr_param, rf); imbalance_at_barrier(distr_func, distr_param, rf); imbalance_in_loop(distr_func, distr_param, rf);

© 2003 Forschungszentrum Jülich, NIC-ZAM [17] Current Design of ATS Framework df_same() df_cyclic2() df_block2() df_linear() df_peak() df_cyclic3() df_block3() DISTRIBUTION do_work() WORKMPI PROPERTIESOpenMP PROPERTIES par_do_omp_work() OpenMP UTILS par_do_mpi_work() alloc_mpi_buf() free_mpi_buf() alloc_mpi_vbuf() free_mpi_vbuf() mpi_commpattern_sendrecv() mpi_commpattern_shift() MPI UTILSTEST PROGRAMS

© 2003 Forschungszentrum Jülich, NIC-ZAM [18] Performance Property Test Programs Single performance property testing Programs can be generated automatically from performance property function signature –Generator based on Program Database Toolkit (PDT) – Property parameters become test program arguments More extensive tests through scripting languages or experiment management system (e.g., Zenturio) – Composite performance property testing Program containing multiple performance property functions Complexity only limited by imagination Currently: manually implemented

© 2003 Forschungszentrum Jülich, NIC-ZAM [19] Example: Single Performance Property Test Program #include "mpi_pattern.h" int main(int argc, char *argv[]) { distr_func_t df = atodf("b2:0.5:1.0"); distr_t *dd = atodd("b2:0.5:1.0"); int r = 1; MPI_Init(&argc, &argv); switch ( argc ) { case 3: r = atoi(argv[2]); case 2: df = atodf(argv[1]); dd = atodd(argv[1]); case 1: break; default: fprintf(stderr, "usage: %s \n", argv[0]); break; } imbalance_at_mpi_barrier(df, dd, r, MPI_COMM_WORLD); MPI_Finalize(); }

© 2003 Forschungszentrum Jülich, NIC-ZAM [20] Example: Single Performance Property Test Program imbalance_at_mpi_barrier b2:0.5:1.0 2 b2:0.1 :2.0 5 Problem: additional property “MPI Setup/Termination Overhead” also holds!

© 2003 Forschungszentrum Jülich, NIC-ZAM [21] Example: Collection of MPI Performance Properties

© 2003 Forschungszentrum Jülich, NIC-ZAM [22] Examples: Detail MPI Properties

© 2003 Forschungszentrum Jülich, NIC-ZAM [23] Example: MPI Properties in 2 Communicators

© 2003 Forschungszentrum Jülich, NIC-ZAM [24] EXPERT Analysis of MPI 2 Communicator Example

© 2003 Forschungszentrum Jülich, NIC-ZAM [25] Example: OpenMP Performance Property

© 2003 Forschungszentrum Jülich, NIC-ZAM [26] ATS: Status and Future Work Initial prototype available from APART website List of MPI, OpenMP, and hybrid validation and benchmark suites 1 st version of ATS framework including –C version of code –Single property test program generator Future Work More complete collection of validation and benchmark suites Real “real world” applications ATS Framework –Fortran version –More complete list of property functions for MPI, OpenMP, hybrid, and sequential performance properties –Documentation