OMIS Approach to Grid Application Monitoring Bartosz Baliś Marian Bubak Włodzimierz Funika Roland Wismueller.

Slides:



Advertisements
Similar presentations
Doc.: IEEE /0046r0 Submission July 2009 Ari Ahtiainen, NokiaSlide 1 A Cooperation Mechanism for Coexistence between Secondary User Networks on.
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.
Interactive and semiautomatic performance evaluation W. Funika, B. Baliś M. Bubak, R. Wismueller.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science
An Automata-based Approach to Testing Properties in Event Traces H. Hallal, S. Boroday, A. Ulrich, A. Petrenko Sophia Antipolis, France, May 2003.
Effective Coordination of Multiple Intelligent Agents for Command and Control The Robotics Institute Carnegie Mellon University PI: Katia Sycara
Technical Architectures
Web-based Distributed Flexible Manufacturing System (FMS) Monitoring and Control Student: Wei Liu Instructor: Dr. Chang Apr. 23, 2003.
Ensuring Non-Functional Properties. What Is an NFP?  A software system’s non-functional property (NFP) is a constraint on the manner in which the system.
3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
3.5 Interprocess Communication
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
Managing Agent Platforms with the Simple Network Management Protocol Brian Remick Thesis Defense June 26, 2015.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
1 Chapter 13 Embedded Systems Embedded Systems Characteristics of Embedded Operating Systems.
Institute of Computer Science AGH Performance Monitoring of Java Web Service-based Applications Włodzimierz Funika, Piotr Handzlik Lechosław Trębacz Institute.
Zoltán Mann: Tracing CORBA applications 1/22 Tracing CORBA applications using interceptors Zoltán Mann Supervisor: Dr. Károly Kondorosi Budapest University.
Scientific Computing Department Faculty of Computer and Information Sciences Ain Shams University Supervised By: Mohammad F. Tolba Mohammad S. Abdel-Wahab.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Construction of efficient PDP scheme for Distributed Cloud Storage. By Manognya Reddy Kondam.
Cracow Grid Workshop 2003 Institute of Computer Science AGH A Concept of a Monitoring Infrastructure for Workflow-Based Grid Applications Bartosz Baliś,
Designing Efficient Systems Services and Primitives for Next-Generation Data-Centers K. Vaidyanathan, S. Narravula, P. Balaji and D. K. Panda Network Based.
Advanced Grid-Enabled System for Online Application Monitoring Main Service Manager is a central component, one per each.
26 Sep 2003 Transparent Adaptive Resource Management for Distributed Systems Department of Electrical Engineering and Computer Science Vanderbilt University,
Agent-based Device Management in RFID Middleware Author : Zehao Liu, Fagui Liu, Kai Lin Reporter :郭瓊雯.
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Dennis Hoppe (HLRS) ATOM: A near-real time Monitoring.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.
Trace Generation to Simulate Large Scale Distributed Application Olivier Dalle, Emiio P. ManciniMar. 8th, 2012.
DCE (distributed computing environment) DCE (distributed computing environment)
GRM + Mercury in P-GRADE Monitoring of P-GRADE applications in the Grid using GRM and Mercury.
PARMON A Comprehensive Cluster Monitoring System A Single System Image Case Study Developer: PARMON Team Centre for Development of Advanced Computing,
Lechoslaw Trębacz 1, Włodzimierz Funika 2, Piotr Handzlik 3, Marcin Smętek 2 1 Department of Computer Methods in Metallurgy, AGH, Kraków, Poland 2 Institute.
Cracow Grid Workshop, October 27 – 29, 2003 Institute of Computer Science AGH Design of Distributed Grid Workflow Composition System Marian Bubak, Tomasz.
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
Kyung Hee University 1/41 Introduction Chapter 1.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Tracking Irregularly Moving Objects based on Alert-enabling Sensor Model in Sensor Networks 1 Chao-Chun Chen & 2 Yu-Chi Chung Dept. of Information Management.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
SOFTWARE DESIGN AND ARCHITECTURE LECTURE 13. Review Shared Data Software Architectures – Black board Style architecture.
1 Supporting Dynamic Migration in Tightly Coupled Grid Applications Liang Chen Qian Zhu Gagan Agrawal Computer Science & Engineering The Ohio State University.
Jini Architecture Introduction System Overview An Example.
7. Grid Computing Systems and Resource Management
 CMS data challenges. The nature of the problem.  What is GMA ?  And what is R-GMA ?  Performance test description  Performance test results  Conclusions.
Marcelo R.N. Mendes. What is FINCoS? A set of tools for data generation, load submission, and performance measurement of CEP systems; Main Characteristics:
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Globus: A Report. Introduction What is Globus? Need for Globus. Goal of Globus Approach used by Globus: –Develop High level tools and basic technologies.
Source Level Debugging of Parallel Programs Roland Wismüller LRR-TUM, TU München Germany.
BOF at GGF5, Edinburgh, Scotland, July 21-24, 2002 CrossGrid Architecture Marian Bubak and TAT Institute of Computer Science & ACC CYFRONET AGH, Cracow,
Marian Bubak 1,2, Włodzimierz Funika 1,2, Roland Wismüller 3, Tomasz Arodź 1,2, Marcin Kurdziel 1,2 1 Institute of Computer Science, AGH, Kraków, Poland.
Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Franco Travostino and Admela Jukan jukan at uiuc.edu June 30, 2005 GGF 14, Chicago Grid Network Services Architecture (GNSA) draft-ggf-ghpn-netserv-2.
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
General Grid Monitoring Infrastructure (GGMI) Peter kacsuk and Norbert Podhorszki MTA SZTAKI.
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Definition of Distributed System
University of Technology
Why Threads Are A Bad Idea (for most purposes)
Why Threads Are A Bad Idea (for most purposes)
Why Threads Are A Bad Idea (for most purposes)
Presentation transcript:

OMIS Approach to Grid Application Monitoring Bartosz Baliś Marian Bubak Włodzimierz Funika Roland Wismueller

X# AGENDA Introduction Monitoring architecture – sensors (local monitors, application monitors) – service managers Performance – efficient data gathering – scalability of grid-scale monitoring Producer / consumer communication protocol Comparison to DATAGRID Experience Conclusion

X# Introduction Need for monitoring applications – improve performance – localize bugs For these purposes – specialized tools needed – debuggers, performance analyzers, visualizers, etc. Tools composed of two modules – user interface – monitoring module

X# Introduction (cont’d) Main issues of monitoring on Grid – scale of Grid enormous – many applications, many users, high distribution, high heterogeneity – simply porting existing environments not sufficient! A solution: – underlying universal monitoring system – well defined interface to tools Experience with OMIS / OCM: PVM  MPI, port of tools – next step – move to Grid?

X# Monitoring architecture Compliance with GMA (Grid Monitoring Architecture) – producer / consumer model Sensors – producers of performance data Tools – consumers of the data Direct communication between producers and consumers Producers located via e.g. a directory service

X# Sensors Collect performance data from applications Two types of sensors – local monitors (process sensors) – application monitors

X# Sensors (cont’d) Local monitors – one per node – collect data only from processes on this node – publish themselves in the directory service Application monitors – embedded parts of applications – collect data on various events, e.g. function calls – may improve efficiency and portability – interact with local monitors

X# Monitoring Architecture

X# Service managers Tool + local monitors – one consumer, multiple producers Intermediate entity: service manager – handles requests coming from a tool – splits them into sub-requests for local monitors – collects replies from local monitors – assembles them into a single reply for the tool Both producer (of data for tools) and consumer (of data from local monitors) Offers the functionality of local monitors but on a per- application basis

X# Application Monitors Part of the monitoring system embedded in the application’s processes – have acces to the application address space! Many possible usages – efficient data gathering and storing – may take over some of the local monitor’s tasks – may be used to dynamically load monitoring extensions – even more for multithreaded applications

X# Application Monitors – debugging example A debugger wants to access a process’ address space Standard system mechanisms: ptrace, /proc – /proc more powerful yet platfom-dependant – synchronous control Via application monitors  request from the debugger to access the data – portable, asynchronous – question: how to ensure that application monitors are not corrupted by the application?

X# Performance Efficient data gathering – data production much more frequent than retrieval – frequency and time of access – difficult to predict Scalability – grid-scale monitoring system – distributed vs. centralized

X# Efficient data gathering Local storing – performance data first stored locally, in the context of application processes – on request, passed to local monitors – saves communication and context switches between application and local monitor processes Efficient data structures – performance data initially preprocessed – summarized information stored in e.g. counters and integrators

X# Scalability Decentralization  multiple service managers instead of one Possible approaches – fixed number of service managers, each responsible for part of the system – one service manager starting for every monitored application

X# Fixed number of SMs

X# One SM per application

X# Scalability (cont’d) In the first approach – more tight cooperation between service managers will be necessary In the second approach – local monitors must have the ability to serve multiple service managers – service managers locate local monitors via directory service

X# Communication protocol Based on the OMIS specification OMIS = On-line Monitoring Interface Specification – specification of a universal interface between tools and a monitoring system – supports various types of tools – allows for easy extending Necessary Grid-specific extensions (e.g. for authentication)

X# Comparison to DATAGRID Monitoring approach – DG: (semi-)on-line – CG: on-line Architecture – DG: centralized distributed (local monitors and one main monitor) – CG: distributed (local monitors and multiple service managers)

X# Comparison to DATAGRID (cont’d) Data collection – DG: local storing with trace buffering or counters – CG: local storing with preprocessing (counters, integrators) Communication protocol – DG: Not specified – CG: OMIS

X# Experience OMIS-based monitoring system for clusters of workstations – OCM OMIS-based tools – PATOP (performance analysis), DETOP (debugging), others... Local storing and efficient data structures (counters and integrators) proved to be very efficient – full monitoring overhead of about 4% Instrumentation techniques used induce zero- overhead when monitoring inactive

X# Summary Demand for accurate data from monitoring tools Monitoring data handling: production / consumption A general scheme of monitoring compliant with GMA Need of an advanced monitoring infrastructure Concepts of OMIS will be extended to fit Grid