All-in-one graphical tool for the management of DIET a GridRPC middleware Eddy Caron, Frédéric Desprez, David Loureiro, Benjamin Depardon, Aurélien Cedeyn.

Slides:



Advertisements
Similar presentations
Building Portals to access Grid Middleware National Technical University of Athens Konstantinos Dolkas, On behalf of Andreas Menychtas.
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
All-in-one graphical tool for grid middleware management Eddy Caron, Abdelkader Amar, Frédéric Desprez, David Loureiro LIP ENS Lyon, INRIA Rhône-Alpes,
1 OBJECTIVES To generate a web-based system enables to assemble model configurations. to submit these configurations on different.
Monitoring and performance measurement in Production Grid Environments David Wallom.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
Distributed Database Management Systems
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 17 Client-Server Processing, Parallel Database Processing,
Workload Management Massimo Sgaravatto INFN Padova.
Architecture overview 6/03/12 F. Desprez - ISC Cloud Context : Development of a toolbox for deploying application services providers with a hierarchical.
Enabling Grids for E-sciencE Medical image processing web portal : Requirements analysis. An almost end user point of view … H. Benoit-Cattin,
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Cracow Grid Workshop’10 Kraków, October 11-13,
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Eddy Caron Join work with Jonathan Rouzaud-Cornabas, Frédéric Desprez, Rajesh Palanichamy and the DIET Team Ecole Normale Supérieure de Lyon AVALON Research.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
Christopher Jeffers August 2012
KARMA with ProActive Parallel Suite 12/01/2009 Air France, Sophia Antipolis Solutions and Services for Accelerating your Applications.
Bottlenecks: Automated Design Configuration Evaluation and Tune.
Cloud Usage Overview The IBM SmartCloud Enterprise infrastructure provides an API and a GUI to the users. This is being used by the CloudBroker Platform.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Robert Fourer, Jun Ma, Kipp Martin Copyright 2006 An Enterprise Computational System Built on the Optimization Services (OS) Framework and Standards Jun.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
1 Overview of the Application Hosting Environment Stefan Zasada University College London.
Jean-Sébastien Gay LIP ENS Lyon, Université Claude Bernard Lyon 1 INRIA Rhône-Alpes GRAAL Research Team Join work with DIET TEAM D istributed I nteractive.
Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
A Survey of Distributed Task Schedulers Kei Takahashi (M1)
Scientific Workflow Scheduling in Computational Grids Report: Wei-Cheng Lee 8th Grid Computing Conference IEEE 2007 – Planning, Reservation,
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
1 Andreea Chis under the guidance of Frédéric Desprez and Eddy Caron Scheduling for a Climate Forecast Application ANR-05-CIGC-11.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
1 Large-Scale Profile-HMM on the Grid Laurent Falquet Swiss Institute of Bioinformatics CH-1015 Lausanne, Switzerland Borrowed from Heinz Stockinger June.
LOGO Development of the distributed computing system for the MPD at the NICA collider, analytical estimations Mathematical Modeling and Computational Physics.
Microsoft Management Seminar Series SMS 2003 Change Management.
WP1 : Applications Océan / atmosphère Cosmologie Site d'expertise algèbre linéaire creuse TLSE.
MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Pour Michel Hello, Tu peux trouver dans ce ppt 3 parties, je te laisse te servir. - L’outil réalisé par GRAAL et pour la communauté de Grid’5000: GRUDU.
Grid Remote Execution of Large Climate Models (NERC Cluster Grid) Dan Bretherton, Jon Blower and Keith Haines Reading e-Science Centre
Daniele Spiga PerugiaCMS Italia 14 Feb ’07 Napoli1 CRAB status and next evolution Daniele Spiga University & INFN Perugia On behalf of CRAB Team.
STAR Scheduler Gabriele Carcassi STAR Collaboration.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
Next Generation of Apache Hadoop MapReduce Owen
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Daniele Lezzi Execution of scientific workflows on federated multi-cloud infrastructures IBERGrid Madrid, 20 September 2013.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
Advanced Computing Facility Introduction
Workload Management Workpackage
Introduction to Distributed Platforms
GWE Core Grid Wizard Enterprise (
Software Engineering Introduction to Apache Hadoop Map Reduce
Wide Area Workload Management Work Package DATAGRID project
Presented By: Darlene Banta
Overview of Workflows: Why Use Them?
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

All-in-one graphical tool for the management of DIET a GridRPC middleware Eddy Caron, Frédéric Desprez, David Loureiro, Benjamin Depardon, Aurélien Cedeyn LIP ENS Lyon, INRIA Rhône-Alpes, GRAAL project

DIET Dashboard -- Motivations DIET hierarchies are designed to be deployed on grids/clusters of nodes Users need several and complex tools for the management of resources and client/server applications. A distributed middleware deployment is not easily manageable: If you deal with a large amount of nodes If you manage your resource reservations by hand If you need to write each configuration file of your middleware by hand If you need to launch each component of your middleware by hand

DIET Experiment workflow Resources Reservation DIET Platform Design Resources Mapping DIET Platform Generation DIET Platform Deployment Experiment Workflow Design Workflow Execution Results Recuperation

DIET Dashboard Extensible set of tools for the DIET community Based on seven tools:  DIET designer  DIET Mapping tool  DIET Deployment tool  XML GoDIETGenerator  Workflow designer  Workflow log service  DIET resource tool aka GRUDU DIET Tools Workflow Tools Grid Tools The DIET DashBoard is written in Java Provides to the DIET end-user, friendly-user interfaces to design, deploy and monitor the execution of client/server applications Also provides to the grid user tools for the allocation and monitoring resources on Grid'5000

DIET Resources Tool To manage grid resources used by the application Currently only used for Grid'5000 platform. Provides several operations to facilitate the access to this platform. Main goals: Displaying the status of the platform (grid/site/job level) Resources allocation through the use of OAR (v1 & v2 are supported) Resources monitoring through the use of the Ganglia (site/job nodes) Deployment management with a GUI for KaDeploy (multiple sites at a time) A terminal emulator (access frontale/site frontale/job main node connection) A file transfer manager (local/remote and synchronization features)

Grid'5000 Reservation Utility for Deployment Usage Web:

GRUDU – Resources Allocation We are able to reserve ressources (OAR1 & OAR2)  Time parameters, date and reservation walltime  Queue  OARGrid sub behaviour/ Script to launch

GRUDU – Monitoring We are able to monitor the status of the grid/site/a job. We are able to get instantaneous/historical data with Ganglia

GRUDU - KaDeploy/JFTP GUI for KaDeploy jobs deployment File Transfert interface (local remote/rsync on Grid'5000)

DIET Designer/Mapping - Allows the user to design graphically a DIET hierarchy. - Only the application characteristics are defined (agent type: Master or Local and SeD parameters). - Allow the user to map DIET components ont the allocated Grid'5000 resources - The mapping is done in an interactive way by selecting the site then DIET agents or SeD.

XML GoDIET Generator To help the end-user creating hierarchies from existing frameworks based on the reserved resources The user will be asked to choose an experience (a framework of hierarchy) from the one available (personal hierarchies can be added) For each hierarchy the user will have to specify the required elements involved (MA/LA/SeD) Finally a platform will be generated and the user can deploy it through the DIET deployment tool

DIET Deployment Tool This tool is a graphical interface to GoDIET It provides the basic GoDIET operations: open, launch, stop and also a monitoring mechanism to check if DIET application elements still alive (three states are available: unknown, dead and running)

Workflow Designer/Log Service Compose services to get a complete application workflow in a drag’&’drop fashion Monitor workflows execution by displaying the DAG nodes of each workflow and their states.

Monitoring DIET experiment Online/Offline experiment monitoring DIET Data Management monitoring DIET Services use/selection/etc monitoring DIET Platform performance evaluation

Prototype Cosmo – DIET : Gantt

Prototype Cosmo – DIET : impact DIET

Large scale experiment: the DIET/Ramses case Validation of the DIET architecture at large scale over different administrative domains in the framework of the LEGO project (ANR CICG05-11) Grid’5000  Goal : Launch the maximum of Ramses execution (Grid based Hydro solver application developed at the DAPNIA/CEA for cosmological simulations)  Stress DIET over a large number of machine and in a large period of time  But also stress Grid'  KaDeploy image with DIET and all the mandatory tools  12 clusters on 7 sites : 979 machines for 48 hours  1 MA, 12 LA, 29 SeDs  1824 processors dedicated to Ramses

Large scale experiment on Grid’5000: Requests submitted via DIET 1824 processors dedicated to Ramses 59 simulations (33 complete, 26 partial) Equivalent to 368 days on 1 processor GalaxyMaker & MoMaF: Web interface for submission of parameter sweep jobs Workload modelisation for scheduling predictions Workflow / data management

On Going Work Deploy DIET accross many sites Improve Data management Write a plug-in scheduler

Workflow

Modèle temps exécution GalaxyMaker

Modèle taille outputs GalaxyMaker

Modèle temps exécution MoMaF

Large scale experiment: the DIET/Ramses case Use of the DIET DashBoard:  20 seconds for the reservation of 979 nodes  25 minutes for the deployment with KaDeploy  23 seconds for the deployment of the DIET platform Main difficulties:  Disk space on NFS storage  OmniORB not available on Itanium2  Sites not available for deployment

Conclusion  DIET is a grid middleware designed for scheduling application tasks with a hierarchical architecture  The DIET DashBoard provides to DIET users:  A full-featured framework for experiments  An easy way to manage Grid'5000  The DIET Resources Tool provides to the Grid'5000 community a powerful tool dedicated to the interaction with the grid:  Monitoring  Reservation  Deployment  etc.  The DIET Resources tool exists in a stand alone version known as GRUDU dedicated to the Grid'5000 community

Future Work  Web-based version of the DIET DashBoard Used on the Decrypthon project: WebBoard  GUI for client/server applications design  DIET Data Management interface  Support of other Batch Schedulers (such as LoadLeveler or SGE)  Plugin based architecture ‏

Introduction - Context Climate evolution Global Warming Effect Two problems Long term evolution (need super-computer) Climate model parametrization (need numerous simulations)

Introduction - Motivations The project aims to study the parametrization sensitivity of a climate model A better understanding of parametrization will provide better simulations Once good parameters have been found, we will have the possibility to simulate the climate further in the future Need to perform numerous independent simulations The focus of this talk is the minimization of the execution time of these independent simulations

Outline Introduction Framework Ocean-Atmosphere Application Grid’5000 Diet Scheduling Strategies Experimental Results Conclusion & Future Work

Ocean-Atmosphere scenarios Climate simulation over the 21st century An experiment is composed of several scenarios A scenario is a chain of 1800 monthly simulations (150 years) Input of (n+1)th monthly simulation is the output of the nth one The scenarios are independent. Month 1 Month 2 Month 1799 Month 1800 ….. A scenario

Ocean-Atmosphere running 1 > A monthly simulation Post-processing task Main-task Parallel task (4 to 11 processors)

E. Caron - Ocean-Atmosphere scheduling within DIET - APDCT-08 Software environment GridRPC compliant for interoperability Client/Agent/Server paradigm Middleware with a hierarchical architecture designed to provide scalability Resource finding for the client Plug-in scheduler with hierarchical behavior Data management with replication Easy to deploy Easy to use

Platform environment: Grid’5000 Congregation of resources Composed of numerous clusters distributed over 9 sites all over France All nodes of a cluster have access to a NFS to store data Possibility to deploy its own system image on nodes Well suited to execute our independent scenarios

Outline Introduction Framework Scheduling Strategies Cluster Level Scheduling Grid Level Scheduling Experimental Results Conclusion & Future Work

Scheduling Strategies We use Grid’5000 as an experiment platform The platform is composed of several heterogeneous clusters Each cluster is homogeneous internally We use Diet to perform the scheduling Cluster 1 Client Cluster 2 Cluster 3 Send request Performance prediction (makespan) Distribution of scenarios Computation Experiment end Diet hierarchy

Cluster Level Scheduling (1/5) We consider an homogeneous platform composed of R resources (processors) We have NS scenarios Execution times take into account the time to get the data, make the computation and store the results T[i] is the time needed to execute a main-task on i processors All post-processing tasks are left at the end of the execution because of main-tasks good speedup If there are too much resources, the post-processing tasks will be executed at the same time

Cluster Level Scheduling (2/5) Cluster name Processor type Node number Core number Memory size Capricor ne AMD Opt Ghz GB Sagitaire AMD Opt Ghz GB Chicon AMD Opt Ghz GB Chti AMD Opt Ghz GB Grelon Intel Xeon 1.6 Ghz GB Clusters are heterogeneous T[i] on 5 clusters of Grid’5000

Cluster Level Scheduling (3/5) We need to find the grouping of processors leading to the best makespan Find ni (number of groups with i resources) such that: The portion of code executed at each time step is maximized We have no more than NS groups and use less that R resources

E. Caron - Ocean-Atmosphere scheduling within DIET - APDCT-08 Cluster Level Scheduling (4/5) Cluster c Re so urc es (pr oc ess ors ) Ti me Scenario 1 Scenario 2 Scenario 3 Scenario 4 Example of grouping: 3 groups (4, 4 and 7 processors) Fairness among scenarios: when a group becomes idle, the task of the less advanced scenario is scheduled

E. Caron - Ocean-Atmosphere scheduling within DIET - APDCT-08 Cluster Level Scheduling (5/5) Every resource is taken into account Makespan is strictly decreasing when adding more resources The decrease rate of the makespan diminishes

E. Caron - Ocean-Atmosphere scheduling within DIET - APDCT-08 Grid Level Scheduling (1/2) Aim: reduce makespan by distributing the NS scenarios among nbClusters clusters When performance prediction is performed, the makespans from 1 to NS scenarios on cluster C are send to the client (performance[C]) Algorithm complexity: O(NS × nbClusters) One experiment: NS = 10 and nbClusters is small on Grid’5000 (≈20) makespan = 0 initialize number of scenarios on each cluster to 0 while there are scenarios to schedule do find cluster C where makespan increases the less increment NSC the number of scenarios on C update makespan with performance[C][NSC] endwhile send scenarios to SeDs

E. Caron - Ocean-Atmosphere scheduling within DIET - APDCT-08 Grid Level Scheduling (2/2) Comparison with Round Robin on 5 clusters Maximum speedup (25%): equal to the speedup when executing one main-task on the slowest and the fastest cluster With a higher load, the algorithm behaves better with a few resources Convergence on gains Gain of 25% ≈ 230h on a ≈ 822h long experiment

Outline Introduction Framework Scheduling Strategies Experimental Results Conclusion & Future Work

E. Caron - Ocean-Atmosphere scheduling within DIET - APDCT-08 Experimental Results (1/2) Because of technical limitations, no more than one scenario can be executed on a single node All nodes on Grid’5000 are bi-cores or quad-cores New constraint: the size of a group has to be divisible by the number of cores per node of the cluster Possibility to make groups of 12 processors to reduce loss Loss due to this technical difficulty: Few resources: loss between 1% and 13% More resources: loss between 1% and 5% Lot of resources: no more loss

Experimental Results (2/2) Accuracy of simulations on 7 experiments Bad with all post-processing tasks at the end (20.8% difference) Good if we consider only main-tasks (6.3% difference) Keeping a resource to execute post-processing tasks during experiment suppresses the simulations inaccuracy Positive difference means the real execution was slower than expected

Outline Introduction Framework Scheduling Strategies Experimental Results Conclusion & Future Work

E. Caron - Ocean-Atmosphere scheduling within DIET - APDCT-08 Conclusion Improve performances in a climate prediction application Modelization of the application Proof of usage of Grid’5000 and Diet Scheduling on real application Scheduling done at two levels Groups of processors at cluster level Distribution of scenarios at grid level Real implementation suffered from technical limitations Simulations are quite precise but we need to keep one resource for post-processing tasks

Future Work Extension of this work to generic independent chains of Dags composed of moldable tasks Resource reservation is done manually, so we want to use tools such as SimGrid/SimBatch to determine how many resources to reserve and then, use the SeDBatch to make the reservation automatically