Performance Analysis Necessity or Add-on in Grid Computing Michael Gerndt Technische Universität München

Slides:



Advertisements
Similar presentations
Database Architectures and the Web
Advertisements

Grid Monitoring Discussion Dantong Yu BNL. Overview Goal Concept Types of sensors User Scenarios Architecture Near term project Discuss topics.
A conceptual model of grid resources and services Authors: Sergio Andreozzi Massimo Sgaravatto Cristina Vistoli Presenter: Sergio Andreozzi INFN-CNAF Bologna.
Resource Management of Grid Computing
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Cracow Grid Workshop, November 5-6, 2001 Towards the CrossGrid Architecture Marian Bubak, Marek Garbacz, Maciej Malawski, and Katarzyna Zając.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
Distributed Systems Architectures
Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/15 EICE team Model-Level Debugging of Embedded Real-Time Systems Wolfgang Haberl, Markus.
The CrossGrid project Juha Alatalo Timo Koivusalo.
1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
CrossGrid Task 3.3 Grid Monitoring Trinity College Dublin (TCD) Brian Coghlan Paris MAR-2002.
Instrumentation and Measurement CSci 599 Class Presentation Shreyans Mehta.
Institute of Computer Science AGH Performance Monitoring of Java Web Service-based Applications Włodzimierz Funika, Piotr Handzlik Lechosław Trębacz Institute.
23 September 2004 Evaluating Adaptive Middleware Load Balancing Strategies for Middleware Systems Department of Electrical Engineering & Computer Science.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
Beyond Automatic Performance Analysis Prof. Dr. Michael Gerndt Technische Univeristät München
Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
Computer Science Open Research Questions Adversary models –Define/Formalize adversary models Need to incorporate characteristics of new technologies and.
Computer and Automation Research Institute Hungarian Academy of Sciences Presentation and Analysis of Grid Performance Data Norbert Podhorszki and Peter.
Cluster Reliability Project ISIS Vanderbilt University.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
GRM + Mercury in P-GRADE Monitoring of P-GRADE applications in the Grid using GRM and Mercury.
An Integrated Instrumentation Architecture for NGI Applications Ian Foster, Darcy Quesnel, Steven Tuecke Argonne National Laboratory The University of.
Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May Grid Monitoring Services Robin Middleton RAL/PPD24-May-01.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
The Globus Project: A Status Report Ian Foster Carl Kesselman
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
OMIS Approach to Grid Application Monitoring Bartosz Baliś Marian Bubak Włodzimierz Funika Roland Wismueller.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
A Summary of the Distributed System Concepts and Architectures Gayathri V.R. Kunapuli
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
Superscheduling and Resource Brokering Sven Groot ( )
George Tsouloupas University of Cyprus Task 2.3 GridBench ● 1 st Year Targets ● Background ● Prototype ● Problems and Issues ● What's Next.
Performance evaluation on grid Zsolt Németh MTA SZTAKI Computer and Automation Research Institute.
Computer and Automation Research Institute Hungarian Academy of Sciences SZTAKI’s work in DataGrid WP September Norbert Podhorszki Laboratory of.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
7. Grid Computing Systems and Resource Management
Marcelo R.N. Mendes. What is FINCoS? A set of tools for data generation, load submission, and performance measurement of CEP systems; Main Characteristics:
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Globus Grid Tutorial Part 2: Running Programs Across Multiple Resources.
GraDS MacroGrid Carl Kesselman USC/Information Sciences Institute.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
BOF at GGF5, Edinburgh, Scotland, July 21-24, 2002 CrossGrid Architecture Marian Bubak and TAT Institute of Computer Science & ACC CYFRONET AGH, Cracow,
Marian Bubak 1,2, Włodzimierz Funika 1,2, Roland Wismüller 3, Tomasz Arodź 1,2, Marcin Kurdziel 1,2 1 Institute of Computer Science, AGH, Kraków, Poland.
Grid performance analysis Directions, issues and open problems Zsolt Németh MTA SZTAKI Computer and Automation Research Institute.
March 2004 At A Glance The AutoFDS provides a web- based interface to acquire, generate, and distribute products, using the GMSEC Reference Architecture.
DataTAG is a project funded by the European Union International School on Grid Computing, 23 Jul 2003 – n o 1 GridICE The eyes of the grid PART I. Introduction.
The Globus Toolkit The Globus project was started by Ian Foster and Carl Kesselman from Argonne National Labs and USC respectively. The Globus toolkit.
EGEE is a project funded by the European Union under contract IST Generic Applications Requirements Roberto Barbera NA4 Generic Applications.
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
General Grid Monitoring Infrastructure (GGMI) Peter kacsuk and Norbert Podhorszki MTA SZTAKI.
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University Java - Introduction.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Distributed Systems Architectures. Topics covered l Client-server architectures l Distributed object architectures l Inter-organisational computing.
University of Technology
QNX Technology Overview
The Anatomy and The Physiology of the Grid
The Anatomy and The Physiology of the Grid
Presentation transcript:

Performance Analysis Necessity or Add-on in Grid Computing Michael Gerndt Technische Universität München

LRR at Technische Universität München Chair for Computer Hardware & Organisation / Parallel Computer Architecture (Prof. A. Bode) Three groups in parallel & distributed architectures Architectures –SCI Smile project –DAB –Hotswap Tools –CrossGrid –APART Applications –CFD –Medicine –Bioinformatics

New Campus at Garching

Outline PA on parallel systems Scenarios for PA in Grids PA support in Grid projects APART

Performance Analysis for Parallel Systems Development cycle Assumption: Reproducibility Instrumentation Static vs Dynamic Source-level vs object-level Monitoring Software vs Hardware Statistical profiles vs Event traces Analysis Source-based tools Visualization tools Automatic analysis tools Coding Performance Monitoring and Analysis Production Program Tuning

Grid Computing Grids enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals -- assuming the absence of… –central location, –central control, –omniscience, –existing trust relationships. [Globus Tutorial] Major differences to parallel systems Dynamic system of resources Large number of diverse systems Sharing of resources Transparent resource allocation

Scenarios for Performance Monitoring and Analysis Post-mortem application analysis Self-tuning applications Grid scheduling Grid management [GGF performance working group, DataGrid, CrossGrid]

Post-Mortem Application Analysis Requires either resources with known performance characteristics (QoS) or system-level information to assess performance data scalability of performance tools Focus will be on interacting components 1.George submits job to the Grid 2.Job is executed on some resources 3.George receives performance data 4.George analyzes performance

Self-Tuning Applications Requires Integration of system and application monitoring On-the-fly performance analysis API for accessing monitor data (if PA by application) Performance model and interface to steer adaptation (If PA and tuning decision by external component.) 1.Chris submits job 2.Application adapts to assigned resources 3.Application starts 4.Application monitors performance and adapts to resource changes

Grid-Scheduling Requires PA of the grid application Possibly benchmarking the application Access to current performance capabilities of resources Even better to predicted capabilities 1.Gloria determines performance critical application properties 2.She specifies a performance model 3.Grid scheduler selects resources 4.Application is started

Grid-Management Requires PA of historical system information Need to be done in a distributed fashion 1.George claims to see bad performance since one week. 2.The helpdesk runs the Grid performance analysis software. 3.Periodical saturation of connections is detected.

New Aspect of Performance Analysis Transparent resource allocation Dynamism in resource availability Approaches in the following projects: Damien Datagrid Crossgrid GrADS

Analyzing Meta-Computing Applications DAMIEN (IST-25406), 5 partners Goals Analysis of GRID-enabled applications –using MpCCI ( –using PACX-MPI ( Analysis of GRID components –PACX-MPI and MpCCI Extend Vampir/Vampirtrace technology

MetaVampirtrace for Application Analysis GRID-MPI profiling routine( PPACX_Send ) Native MPIGRID communication layer Compiled code( PACX_Send ) Routine call Tracefile MetaVT wrapper( PACX_Send ) Routine call Name shift (CPP) Application code( MPI_Send )

MetaVampirtrace for GRID Component Analysis Name shift (CPP) Application code( MPI_Send ) Tracefile MetaVT wrapper( MPI_Send ) MPI profiling routine( PMPI_Send ) Compiled code( PACX_Send ) Routine call GRID-MPI layer( PACX_Send ) Routine call TCP/IP GRID-MPI communication layer

MetaVampir General counter support Grid component metrics Hierarchical analysis Analysis at each level Aggregate data for groups Improves scalability Structured tracefiles Subdivided into frames Stripe data across multiple files Metacomputer Node 2Node 1 SMP node 1 P_1 GRID–DaemonsMPI processes SendRecv SMP node 2 P_n All MPI Processes P_1P_n

Process Level

System Level

Grid Monitoring Architecture Developed by GGF Performance working group Separation of data discovery and data transfer Data discovery via (possibly distributed) directory service Data transfer among producer – consumer GMA interactions Publish/subscribe Query/response Notification Directory includes Types of events Accepted protocols Security mechanisms Consumer Producer Directory Service event publication information

R-GMA in DataGrid DataGrid R-GMA DataGrid WP3 hepunx.rl.ac.uk/edg/wp3 Relational approach to GMA Producers announce: SQL “CREATE TABLE” publish: SQL “INSERT” Consumers collect: SQL “SELECT” Approach to use the relational model in a distributed environment It can be used for information service as well as system and application monitoring.

P-Grade and R-GMA P-GRADE Environment developed at MTA SZTAKI GRM (Distributed monitor) Prove (Visualization tool) GRM creates two tables in R-GMA GRMTrace (String appName, String event): all events GRMHeader (String appName, String event): important header events only GRM Main Monitor SELECT “*” FROM GRMHeader WHERE appName=“...” SELECT “*” FROM GRMTrace WHERE appName=“...”

Main Monitor Site User’s Host Host 1Host 2 Application Process Appl. Process R-GMA PROVE Connection to R- GMA

Analyzing Interactive Applications in CrossGrid CrossGrid funded by EU: 03/2002 – 02/ Simulation of vascular blood flow Interactive visualization and simulation –response times are critical –0.1 sec (head movement) to 5 min (change in simulation) Performance analysis –response time and its breakdown –performance data for specific interactions

CrossGrid Application Monitoring Architecture OCM-G = Grid-enabled OMIS-Compliant Monitor OMIS = On-line Monitoring Interface Specification Application-oriented Information about running applications On-line Information collected at runtime Immediately delivered to consumers Information collected via instrumentation Activated / deactivated on demand Information of interest defined at runtime (lower overhead)

OMIS Performance Tool Service Manager LM P1 P2 LM P4 P5 LM P3 th_stop(Sim) th_stop(P1,P2)th_stop(P4,P5)th_stop(P3) Stop

G-PM

Application Specific Measurement G-PM offers standard metrics CPU time, communication time, disk I/O,... Application programmer provides Relevant events inside application (probes) Relevant data computed by the application Association between events in different processes G-PM allows to define new metrics Based on existing ones and application specific information Metric Definition Language under development Compilation or interpretation will be done by High-Level Analysis Component.

Managing Dynamism: The GrADS Approach GrADS (Grid Application Development Software) Funded by National Science Foundation, started 2000 Goal: Provide application development technologies that make it easy to construct and execute applications with reliable [and often high] performance in the constantly-changing environment of the Grid. Major techniques to handle transparency and dynamism: Dynamic configuration to available resources (configurable object programs) Performance contracts and dynamic reconfiguration

GrADS Software Architecture PSEPSE Config. object program whole program compiler Source appli- cation libraries Realtime perf monitor Dynamic optimizer Grid runtime System (Globus) negotiation Software Components Scheduler/ Service Negotiator Performance feedback Program Preparation SystemExecution Environment

Configurable Object Programs Integrated mapping strategy and cost model Performance enhanced by context-depend. variants Context includes potential execution platforms Dynamic Optimizer performs final binding Implements mapping strategy Chooses machine-specific variants Inserts sensors and actuators Perform final compilation and optimization

Performance Contracts A performance contract specifies the measurable performance of a grid application. Given set of resources, capabilities of resources, problem parameters the application will achieve a specified, measurable performance

Creation of Performance Contracts Program Performance Model Resource Broker Resource Assignment Performance Contract Developer Compiler Measurements MDS NWS

History-Based Contracts Resources given by broker Capabilities of resources given by Measurements of this code on those resources Possibly scaled by the Network Weather Service e.g. Flops/second and Bytes/second Problem parameters Given by the input data set Application intrinsic parameters Independent of execution platform Measurements of this code with same problem parameters e.g. floating point operation count, message count, message bytes count Measurable Performance Prediction Combining application parameters and resource capabilities

Application and System Space Signature Application Signature trajectory of values through N-dimensional metric space one trajectory per process e.g. one point per iteration e.g. metric: iterations/flop System Signature trajectory of values through N-dimensional metric space will vary across application executions, even on the same resources e.g. metric iterations/second resource capabilities

Verification of Performance Contracts Execution Contract Monitor Rescheduling Sensor Data Steer Dynamic Optimizer Violation detection Fault detection

APART ESPRIT IV Working Group, 01/1999 – 12/2000 IST Working Group, 08/2001 – 07/ Focus: Network European development projects for automatic performance analysis tools –Testsuite for automatic analysis tools Automatic Performance Analysis and Grid Computing (WP3 – Peter Kacsuk)

Summary Scenarios Post-mortem Application Tuning Self-tuning applications Grid scheduling Grid management How to handle transparency and dynamism? Approaches here: Damien: Provide static environment. Datagrid: Combining system and application monitoring Crossgrid: On-line analysis GrADS: Performance models and contracts