CEPBA-Tools experiences with MRNet and Dyninst Judit Gimenez, German Llort, Harald Servat

Slides:

Advertisements

Similar presentations

Barcelona Supercomputing Center. The BSC-CNS objectives: R&D in Computer Sciences, Life Sciences and Earth Sciences. Supercomputing support to external.

Advertisements

Configuration management

MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.

Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,

DETAILED DESIGN, IMPLEMENTATIONA AND TESTING Instructor: Dr. Hany H. Ammar Dept. of Computer Science and Electrical Engineering, WVU.

Program Analysis and Tuning The German High Performance Computing Centre for Climate and Earth System Research Panagiotis Adamidis.

Intel® performance analyze tools Nikita Panov Idrisov Renat.

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

University of Maryland Locality Optimizations in cc-NUMA Architectures Using Hardware Counters and Dyninst Mustafa M. Tikir Jeffrey K. Hollingsworth.

Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.

The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.

University of Houston Open Source Software Support for the OpenMP Runtime API for Profiling Oscar Hernandez, Ramachandra Nanjegowda, Van Bui, Richard Krufin.

Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.

Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat.

On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.

Benefits of sampling in tracefiles Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010.

IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.

1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.

Undergraduate Poster Presentation Match 31, 2015 Department of CSE, BUET, Dhaka, Bangladesh Wireless Sensor Network Integretion With Cloud Computing H.M.A.

Instrumentation and Measurement CSci 599 Class Presentation Shreyans Mehta.

Google Distributed System and Hadoop Lakshmi Thyagarajan.

ORIGINAL AUTHOR JAMES REINDERS, INTEL PRESENTED BY ADITYA AMBARDEKAR Overview for Intel Xeon Processors and Intel Xeon Phi coprocessors.

Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.

UAB Dynamic Monitoring and Tuning in Multicluster Environment Genaro Costa, Anna Morajko, Paola Caymes Scutari, Tomàs Margalef and Emilio Luque Universitat.

Judit Giménez, Juan González, Pedro González, Jesús Labarta, Germán Llort, Eloy Martínez, Xavier Pegenaute, Harald Servat Brief introduction.

1 Performance Analysis with Vampir DKRZ Tutorial – 7 August, Hamburg Matthias Weber, Frank Winkler, Andreas Knüpfer ZIH, Technische Universität.

Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum.

Predicting performance of applications and infrastructures Tania Lorido 27th May 2011.

Bottlenecks: Automated Design Configuration Evaluation and Tune.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.

A Specification Language and Test Planner for Software Testing Aolat A. Adedeji 1 Mary Lou Soffa 1 1 DEPARTMENT OF COMPUTER SCIENCE, UNIVERSITY OF VIRGINIA.

PARMON A Comprehensive Cluster Monitoring System A Single System Image Case Study Developer: PARMON Team Centre for Development of Advanced Computing,

Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.

Chapter 34 Java Technology for Active Web Documents methods used to provide continuous Web updates to browser – Server push – Active documents.

Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.

March 17, 2005 Roadmap of Upcoming Research, Features and Releases Bart Miller & Jeff Hollingsworth.

Chapter 3 System Performance and Models. 2 Systems and Models The concept of modeling in the study of the dynamic behavior of simple system is be able.

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,

Performance Analysis Tool List Hans Sherburne Adam Leko HCS Research Laboratory University of Florida.

Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.

Instrumentation in Software Dynamic Translators for Self-Managed Systems Bruce R. Childers Naveen Kumar, Jonathan Misurda and Mary.

UAB Dynamic Tuning of Master/Worker Applications Anna Morajko, Paola Caymes Scutari, Tomàs Margalef, Eduardo Cesar, Joan Sorribes and Emilio Luque Universitat.

Portable Parallel Performance Tools Shirley Browne, UTK Clay Breshears, CEWES MSRC Jan 27-28, 1998.

Belgrade, 25 September 2014 George S. Markomanolis, Oriol Jorba, Kim Serradell Performance analysis Tools: a case study of NMMB on Marenostrum.

Chapter 4 – Threads (Pgs 153 – 174). Threads  A "Basic Unit of CPU Utilization"  A technique that assists in performing parallel computation by setting.

Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida.

Belgrade, 26 September 2014 George S. Markomanolis, Oriol Jorba, Kim Serradell Overview of on-going work on NMMB HPC performance at BSC.

COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.

© 2001 Week (14 March 2001)Paradyn & Dyninst Demonstrations Paradyn & Dyninst Demos Barton P. Miller Computer.

Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.

Other Tools HPC Code Development Tools July 29, 2010 Sue Kelly Sandia is a multiprogram laboratory operated by Sandia Corporation, a.

A Dynamic Tracing Mechanism For Performance Analysis of OpenMP Applications - Caubet, Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001) - Presented by Anita.

Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.

LIOProf: Exposing Lustre File System Behavior for I/O Middleware

Embedded Real-Time Systems Processing interrupts Lecturer Department University.

Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.

Performance Evaluation of Adaptive MPI

L21: Putting it together: Tree Search (Ch. 6)

A configurable binary instrumenter

Department of Computer Science University of California, Santa Barbara

New Features in Dyninst 6.1 and 6.2

Stack Trace Analysis for Large Scale Debugging using MRNet

TeraScale Supernova Initiative

Department of Computer Science University of California, Santa Barbara

Support for Adaptivity in ARMCI Using Migratable Objects

Cluster Computers.

Presentation transcript:

CEPBA-Tools experiences with MRNet and Dyninst Judit Gimenez, German Llort, Harald Servat

Paradyn Week, April-May 2007 Outline CEPBA-Tools environment OpenMP instrumentation using Dyninst Tracing control trough MRNet Our wish list

Paradyn Week, April-May 2007 Where we live Traceland … … aiming at detailed analysis and flexibility in the tools

Paradyn Week, April-May 2007 Importance of details Variance is important Along time Across processors Highly non linear systems Microscopic effects are important May have large macroscopic impact

Paradyn Week, April-May 2007 CEPBA-Tools MPtrace OMPItrace.prv.pcf.cfg Paraver Aaa miss ratio 0.8 Bbb IPC 0.5 Ccc Efficiency 0.4 Ddd bandwidth 520 Paramedir Dimemas.trf MPIDtrace TraceDriver Java, WAS GT4 JIS Nanos Compiler aixtrace2prv AIXtrace LTT2prv LTTtrace GPFS2prv GPFStrace Data display tools trace2trace

Paradyn Week, April-May 2007 CEPBA-Tools Challenge What can we say about an unknown application/system without looking at the source code in short time?

Paradyn Week, April-May 2007 OpenMP instrumentation OMPtrace Instrumentation of OpenMP Insight on: application Run Time scheduling Based on DiTools (SGI/Irix) only calls to dynamic libraries DPCL (IBM/AIX) functions and calls referenced within binary Dyninst (Itamium) functions and calls referenced within binary LD_PRELOAD (some Linux) only calls to dynamic libraries “Evolution” through the available platform except for Itanium (NASA-AMES request)

Paradyn Week, April-May 2007 OpenMP compilation and Run Time Call A A() { } !$omp parallel do do I=1,N loop body enddo Source program libomp Call A A() { } kmpc_fork_call _A_LN_par_regionID { } do I=start,end loop body enddo Idle() { Compiler generated

Paradyn Week, April-May 2007 OpenMP instrumentation points Timeline 1 1 USR_FCT, idA HWC i, Delta OMP_PAR, (Fork/join) PAR_FCT, A_LN_par_regionID HWC i, Delta PAR_FCT, 0 HWC i, Delta (Fork/join) OMP_PAR, USR_FCT, 0 HWC i, Delta 6 Main thread Call A A() { } kmpc_fork_call _A_LN_par_regionID { } do I=start,end loop body enddo

Paradyn Week, April-May 2007 CEPBA-Tools The issue Sufficient information / sufficiently detailed Usable by presentation tool The environment evolution ( ) from few processes to instrumenting hours of execution including more and more information hardware counters, call stack, network counters, system resource usage, MPI collective internals......from traces of few MB to hundreds of GB

Paradyn Week, April-May 2007 Scalability of tracing Techniques for achieving scalability User specified on/off Limit file size (stop when reached, circular buffer) Only computing burst + counters + statistics Library Summarization (software counters – MPI_Iprobe/ MPI_Test) Trace2trace utilities Partial views... autonomic tracing library

Paradyn Week, April-May 2007 MPItrace + MRNet user login node

Paradyn Week, April-May 2007 First target with MRNet A real problem scenario on MareNostrum some large runs punctually have very large degraded collectives instrumenting full run including details of collectives implementation would produce a huge trace Solution MPItrace + MRNet control which information is flushed to disk discard all the details except the related with large collectives

Paradyn Week, April-May 2007 …i+m …1 Implementation Instrumenting on a circular buffer Periodically the MRNet front-end requests information on the collectives duration the “spy” thread stops the main thread analyze the tracing buffer –collects information on the collectives –sends details on the range and duration the root sends back a mask of selection the “spy” thread flushes to disk the selected data resumes the application i …i+n 10…300 i 0 …

Paradyn Week, April-May 2007 First traces – CPMD 245MB, >15500 col <1MB, <85 col 25MB, <85 col LIMIT >= 35ms

Paradyn Week, April-May 2007 First traces – MRNet front-end analysis

Paradyn Week, April-May 2007 Next steps for MPItrace+MRnet Analysis of MRNet Evaluate impact topology / mapping Library control - maximum information, minimum data Automatic switching driven by on-line analysis Tracing level, type of data (counters set, instr. points), on/off Clustering, periodicity detection

Paradyn Week, April-May 2007 Our wish list Dyninst Support to MPI+OpenMP instrumentation Available for PowerPC MRNet Automatically compute the best topology based on available resources maybe considering user preferences about mapping, dispersion degree (fan-out)... Improve MRNet integration with MPI applications