GRM + Mercury in P-GRADE Monitoring of P-GRADE applications in the Grid using GRM and Mercury.

Slides:



Advertisements
Similar presentations
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MyProxy and EGEE Ludek Matyska and Daniel.
Advertisements

Legacy code support for commercial production Grids G.Terstyanszky, T. Kiss, T. Delaitre, S. Winter School of Informatics, University.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Managing Server.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
06/08/10 PBS, LSF and ARC integration Zoltán Farkas MTA SZTAKI LPDS.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison Condor-G: A Case in Distributed.
Moving data using replication Anthony Brown
Visual Solution to High Performance Computing Computer and Automation Research Institute Laboratory of Parallel and Distributed Systems
Debugging code with limited system resource. Minheng Tan Oct
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Rheeve: A Plug-n-Play Peer- to-Peer Computing Platform Wang-kee Poon and Jiannong Cao Department of Computing, The Hong Kong Polytechnic University ICDCSW.
1 DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 3 Processes Skip
Computer and Automation Research Institute Hungarian Academy of Sciences The P-GRADE Visual Parallel Programming Environment Péter Kacsuk Laboratory of.
Sept 27 th – 29 th, 2002Linz 2002, Task Task 3.3 Grid Monitoring Subtask SANTA-G Brian Coghlan, Stuart Kenny Trinity College Dublin.
Introduction to client/server architecture
GRID COMPUTING & GRID SCHEDULERS - Neeraj Shah. Definition A ‘Grid’ is a collection of different machines where in all of them contribute any combination.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
ENTR 452, Chapter 4 (Creativity and the Business Idea)
Advanced Grid-Enabled System for Online Application Monitoring Main Service Manager is a central component, one per each.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Module 12: Designing High Availability in Windows Server ® 2008.
` tuplejump The data engineering platform. A startup with a vision to simplify data engineering and empower the next generation of data powered miracles!
Matthew Palmer, Cambridge University01/10/2015 First Use of the UK e-Science Grid Overview The Physics Experiences Looking forward Conclusions Matthew.
Computer and Automation Research Institute Hungarian Academy of Sciences Presentation and Analysis of Grid Performance Data Norbert Podhorszki and Peter.
STAR scheduling future directions Gabriele Carcassi 9 September 2002.
MaterialsHub - A hub for computational materials science and tools.  MaterialsHub aims to provide an online platform for computational materials science.
About Me I have been working with sharepoint since 2008 My blog:
1 BIG FARMS AND THE GRID Job Submission and Monitoring issues ATF Meeting, 20/06/03 Sergio Andreozzi.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Grid Computing I CONDOR.
Computer and Automation Research Institute Hungarian Academy of Sciences Automatic checkpoint of CONDOR-PVM applications by P-GRADE Jozsef Kovacs, Peter.
SZTAKI in DataGrid 2003 What to do this year. Topics ● Application monitoring (GRM) ● Analysis and Presentation (Pulse) ● Performance of R-GMA.
CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP th March 2003 San Diego.
Introduction to the Adapter Server Rob Mace June, 2008.
Marcelo R.N. Mendes. What is FINCoS? A Java-based set of tools for data generation, load submission, and performance measurement of event processing systems;
OMIS Approach to Grid Application Monitoring Bartosz Baliś Marian Bubak Włodzimierz Funika Roland Wismueller.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
Computer and Automation Research Institute Hungarian Academy of Sciences SZTAKI’s work in DataGrid WP September Norbert Podhorszki Laboratory of.
Review of Condor,SGE,LSF,PBS
 Replication is the process of copying database information  Replication is used for:  Backing up your database  Migrating to a new server  Mirroring.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Applications.
Proposal for a IS schema Massimo Sgaravatto INFN Padova.
GridLab Resource Management System (GRMS) Jarek Nabrzyski GridLab Project Coordinator Poznań Supercomputing and.
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
Website: Answering Continuous Queries Using Views Over Data Streams Alasdair J G Gray Werner.
1 Management of Distributed Data Tomasz Müldner, Elhadi Shakshuki*, Zhonghai Luo and Michael Powell Jodrey School of Computer Science, Acadia University,
Pilot Factory using Schedd Glidein Barnett Chiu BNL
LCG workshop on Operational Issues CERN November, EGEE CIC activities (SA1) Accounting: current status
Nick LeRoy Computer Sciences Department University of Wisconsin-Madison Hawkeye.
 CMS data challenges. The nature of the problem.  What is GMA ?  And what is R-GMA ?  Performance test description  Performance test results  Conclusions.
Marcelo R.N. Mendes. What is FINCoS? A set of tools for data generation, load submission, and performance measurement of CEP systems; Main Characteristics:
System/SDWG Update Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
Introduction to Grid Computing and its components.
Usage Consolidation & Analytics Digital Interest Group Jay Glaisyer, Senior Director of Sales.
Kelly Davis MPI-AEI Why GAT? Kelly Davis MPI-AEI.
Role Based Access Control In oneM2m
CERN IT Department CH-1211 Geneva 23 Switzerland t A proposal for improving Job Reliability Monitoring GDB 2 nd April 2008.
Source Level Debugging of Parallel Programs Roland Wismüller LRR-TUM, TU München Germany.
An Active Security Infrastructure for Grids Stuart Kenny*, Brian Coghlan Trinity College Dublin.
Marcelo R.N. Mendes. What is FINCoS? A Java-based set of tools for data generation, load submission, and performance measurement of event processing systems;
APEL Architecture Alison Packer. Overview Grid jobs accounting tool APEL Client software - installed in sites (CEs, gLite- APEL node) APEL Server accepts.
General Grid Monitoring Infrastructure (GGMI) Peter kacsuk and Norbert Podhorszki MTA SZTAKI.
Tamas Kiss University Of Westminster
DMQ4:Instruments & Sensors for online remote access
MaterialsHub - A hub for computational materials science and tools.
Thank you for joining. This presentation will begin shortly.
Basic Grid Projects – Condor (Part I)
Presentation transcript:

GRM + Mercury in P-GRADE Monitoring of P-GRADE applications in the Grid using GRM and Mercury

Target ● Collect trace about a ● P-GRADE application ● running on one cluster but ● Migrating among several clusters. ● P-GRADE uses Grid Resource Broker to submit the application (E.g. Condor)

Tools ● GRM – Instrumentation library to generate trace – Prove visualisation tool – Main process to gather traces from Mercury, maintain trace file and communicate with Prove ● Mercury Monitor – Monitoring service on each cluster – Receive trace records from the application and – Deliver them to GRM (the user's machine)

Structure ● GRM – Instrumentation library in application processes – Main process and Prove on user's machine ● Mercury Monitor – Local Monitor as service on each machine of each cluster – Main Monitor on the frontend node of a cluster (connecting to all LM's on that cluster)

GRM and Mercury GRM Application Process Cluster 1 User's Host Host 1Host 2Host 1 Local Monitor LM Cluster 2 Local Monitor LM Main Monitor MM Appl. Process PROVE P-GRADE

Issues of monitoring ● Start-up – P-GRADE submits the job. – Resource Broker places the job somewhere. – Currently we do not use any Information Service to find out, where the job is running (in DataGrid we use R-GMA for this purpose). – Therefore, GRM should subscribe for trace at each Mercury main monitor (i.e. connect all clusters at the beginning which is not scalable for large grids).

Issues of monitoring ● Collecting trace – Mercury stops the process if it generates trace and there is no subscription for it. – Therefore, there is NO timing problem, i.e., GRM can start in advance or after the application start. – Mercury currently supports the push model therefore, GRM (and PROVE) receives trace as stream (opposite to the pull model of the original GRM in P- GRADE). – I.e. On-line monitoring now, not semi-on-line

Monitoring the migration ● Goal: In the visualisation the same process line is continued for the migrated process. A colored line shows the time of migration. ● In P-GRADE the processes has their own ID, which is used in their communication as well as in GRM instrumentation. This ID remains the same in the migrated process. ● The migration function generates special trace records for the start and the end of the migration. ● Thus, the generated trace satisfies our goal.

Monitoring the migration (cont.) ● Mercury monitor needs – The migrating process should close the connection to the local monitor – The migrated process should connect as new process to the local monitor (on a different machine, on different cluster). ● Mercury monitor – Takes care of delivering data received in the new connection to the original subscriptor (GRM) – Note: GRM is subscribed at each cluster at the beginning.