June 28, 2015 1 Resource and Test Management in Grids Rapid Prototyping in e-Science VL-e Workshop, Amsterdam, NL Dick Epema, Catalin Dumitrescu, Hashim.

Slides:

Advertisements

Similar presentations

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University

Advertisements

7 april SP3.1: High-Performance Distributed Computing The KOALA grid scheduler and the Ibis Java-centric grid middleware Dick Epema Catalin Dumitrescu,

SLA-Oriented Resource Provisioning for Cloud Computing

Introduction CSCI 444/544 Operating Systems Fall 2008.

Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.

A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.

Workshop on HPC in India Grid Middleware for High Performance Computing Sathish Vadhiyar Grid Applications Research Lab (GARL) Supercomputer Education.

June 1, Inter-Operating Grids through Delegated MatchMaking Alexandru Iosup, Dick Epema PDS Group, TU Delft, NL Todd Tannenbaum, Matt Farrellee,

June 1, GrenchMark : Towards a Generic Framework for Analyzing, Testing, and Comparing Grids ASCI Conference 2006 A. Iosup, D.H.J. Epema PDS Group,

Resource Management of Grid Computing

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

June 2, GrenchMark : A Framework for Analyzing, Testing, and Comparing Grids CCGrid 2006 A. Iosup, D.H.J. Epema PDS Group, ST/EWI, TU Delft.

CoreGRID Workpackage 5 Virtual Institute on Grid Information and Monitoring Services Authorizing Grid Resource Access and Consumption Erik Elmroth, Michał.

June 3, ServMark A Hierarchical Architecture for Testing Grids Santiago, Chile A. Iosup, H. Mohamed, D.H.J. Epema PDS Group, ST/EWI, TU Delft C.

June 3, 2015 Synthetic Grid Workloads with Ibis, K OALA, and GrenchMark CoreGRID Integration Workshop, Pisa A. Iosup, D.H.J. Epema Jason Maassen, Rob van.

Universität Dortmund Robotics Research Institute Information Technology Section Grid Metaschedulers An Overview and Up-to-date Solutions Christian.

The Performance of Bags-Of-Tasks in Large-Scale Distributed Computing Systems Alexandru Iosup, Ozan Sonmez, Shanny Anoep, and Dick Epema ACM/IEEE Int’l.

Inter-Operating Grids through Delegated MatchMaking Alexandru Iosup, Dick Epema, Hashim Mohamed,Mathieu Jan, Ozan Sonmez 3 rd Grid Initiative Summer School,

Workload Management Workpackage Massimo Sgaravatto INFN Padova.

Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.

DAS-3/Grid’5000 meeting: 4th December The KOALA Grid Scheduler over DAS-3 and Grid’5000 Processor and data co-allocation in grids Dick Epema, Alexandru.

1 A Performance Study of Grid Workflow Engines Alexandru Iosup and Dick Epema PDS Group Delft University of Technology The Netherlands Corina Stratan Parallel.

1 Trace-Based Characteristics of Grid Workflows Alexandru Iosup and Dick Epema PDS Group Delft University of Technology The Netherlands Simon Ostermann,

1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.

June 25, GrenchMark: A synthetic workload generator for Grids KOALA Workshop A. Iosup, H. Mohamed, D.H.J. Epema PDS Group, ST/EWI, TU Delft.

June 25, GrenchMark: Synthetic workloads for Grids First Demo at TU Delft A. Iosup, D.H.J. Epema PDS Group, ST/EWI, TU Delft.

The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.

Workload Management Massimo Sgaravatto INFN Padova.

June 6, 2002D.H.J. Epema/PDS/TUD1 Processor Co-Allocation in Multicluster Systems DAS-2 Workshop Amsterdam June 6, 2002 Anca Bucur and Dick Epema Parallel.

University of Dortmund June 30, On Grid Performance Evaluation using Synthetic Workloads JSSPP 2006 Alexandru Iosup, Dick Epema PDS Group, ST/EWI,

July 13, “How are Real Grids Used?” The Analysis of Four Grid Traces and Its Implications IEEE Grid 2006 Alexandru Iosup, Catalin Dumitrescu, and.

Euro-Par 2008, Las Palmas, 27 August DGSim : Comparing Grid Resource Management Architectures Through Trace-Based Simulation Alexandru Iosup, Ozan.

Client/Server Grid applications to manage complex workflows Filippo Spiga* on behalf of CRAB development team * INFN Milano Bicocca (IT)

LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.

Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

Panel Abstractions for Large-Scale Distributed Systems Henri Bal Vrije Universiteit Amsterdam.

 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.

Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.

Cluster Reliability Project ISIS Vanderbilt University.

Computational Design of the CCSM Next Generation Coupler Tom Bettge Tony Craig Brian Kauffman National Center for Atmospheric Research Boulder, Colorado.

1 Performance Evaluation of Computer Systems and Networks Introduction, Outlines, Class Policy Instructor: A. Ghasemi Many thanks to Dr. Behzad Akbari.

Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.

Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.

1 Challenge the future KOALA-C: A Task Allocator for Integrated Multicluster and Multicloud Environments Presenter: Lipu Fei Authors: Lipu Fei, Bogdan.

Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.

October 18, 2005 Charm++ Workshop Faucets A Framework for Developing Cluster and Grid Scheduling Solutions Presented by Esteban Pauli Parallel Programming.

Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.

Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”

Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.

GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.

MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.

International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.

INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.

Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing

Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.

Tool Integration with Data and Computation Grid “Grid Wizard 2”

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.

Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.

DataTAG is a project funded by the European Union International School on Grid Computing, 23 Jul 2003 – n o 1 GridICE The eyes of the grid PART I. Introduction.

Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.

INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.

Breaking the frontiers of the Grid R. Graciani EGI TF 2012.

1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.

The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.

Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,

Introduction to Load Balancing:

Resource and Test Management in Grids

Presentation transcript:

June 28, Resource and Test Management in Grids Rapid Prototyping in e-Science VL-e Workshop, Amsterdam, NL Dick Epema, Catalin Dumitrescu, Hashim Mohamed, Alexandru Iosup, Ozan Sonmez Parallel and Distributed Systems Group Delft University of Technology

June 28, Outline A Brief Introduction to Grid Computing Koala: Processor and Data Co-Allocation in Grids  The Co-Allocation Problem in Grids  The Koala Design  Koala and the DAS Community  The Future of Koala GrenchMark: Analyzing, Testing, and Comparing Grids  Grid Performance Evaluation Issues  The GrenchMark Architecture  Experience with GrenchMark Take home message

June 28, A Brief Introduction to Grid Computing Typical grid environment e.g., the DAS Applications [!] Resources Compute (Clusters) Storage (Dedicated) Network Virtual Organizations, Projects (e.g., VL-e), Groups, Users Grids vs. (traditional) parallel production environments Dynamic Heterogeneous Very large-scale (world) No central administration → Most problems are NP-hard, need experimental validation

June 28, Outline A Brief Introduction to Grid Computing Koala: Processor and Data Co-Allocation in Grids  The Co-Allocation Problem in Grids  The Koala Design  Koala and the DAS Community  The Future of Koala GrenchMark: Analyzing, Testing, and Comparing Grids  Grid Performance Evaluation Issues  The GrenchMark Architecture  Experience with GrenchMark Take home message

June 28, The Co-allocation Problem in Grids (1) Motivation Co-allocation = the simultaneous allocation of resources in multiple clusters to single applications which consist of multiple components Reasons Use more resources than available at single cluster at given time Create a specific virtual environment (e.g., visualization cluster, geographically spread data) Achieve reliability through replication on multiple clusters Avoid resource contention on the same site (e.g., batches)

June 28, The Co-allocation Problem in Grids (2) Overall Example global queue LS local queues with local schedulers local jobs global job KOALA clusters LS load sharing co-allocation Source: Dick Epema

June 28, The Co-allocation Problem in Grids (3) Details: Processors and Data Co-Alloc. Jobs have access to processors and data from many sites Files stored at different file sites, replicas may exist Scheduler decides on job component placement at execution sites Jobs can be of high or low priority Source: Hashim Mohamed

June 28, The Co-allocation Problem in Grids (4) Details: Co-Allocated Job Types fixed jobs Job component size and placement fixed by user non-fixed jobs Job component size fixed by user, placement by scheduler decision semi-fixed jobs Job component size and placement by scheduler decision / fixed by user flexible jobs Job component size and placement by scheduler decision

June 28, The Koala Design Selection Placing job components Control Transfer executable and input files Instantiation Claiming resources selected for each job component Run Submit, then monitor job execution (fault-tolerance) Source: Hashim Mohamed

June 28, The Koala Selection Step Many Placement Policies Originally supported co-allocation policies: Worst-Fit: balance job components across sites Close-to-Files: take into account the locations of input files to minimize transfer times (Flexible) Cluster Minimization: mitigate inter-cluster communication; can also split the job automatically But, different application types require different ways of component placement So: Modular structure with pluggable policies Take into account internal structure of applications

June 28, The Koala Selection Step HOCs: Exploiting Application Structure Higher-Order Components: Pre-packaged software components with generic patterns of parallel behavior Patterns: master-worker, pipelines, wavefront Benefits: Facilitates parallel programming in grids Enables user-transparent scheduling in grids Most important additional middleware: Translation layer that builds a performance model from the HOC patterns and the user-supplied application parameters Supported by KOALA (with Univ. of Münster) Initial results: up to 50% reduction in runtimes

June 28, Problem: How to support many application types, each with specific (and difficult) requirements? Solution: runners (=interface modules) Currently supported: Any type of single-component job MPI/DUROC jobs Ibis jobs HOC applications API for extensions: write your own! The Koala Instantiation Step The Runners runner

June 28, Koala and the DAS Community Extensive experience gathered while assessing various co-allocation policies: over 25,000 completed jobs! Koala has been released on the DAS in Sep 2005 [ ] Hands-on Tutorials (last in Spring 2006) Documentation (web-site) Papers IEEE Cluster’04, Dagstuhl FGG’04, EGC’05, IEEE CCGrid’05, IEEE Cluster’06, etc. Koala helps you get results: IEEE CCGrid’06, others submitted

June 28, The Future of Koala Support for more applications types, e.g., Workflows, Parameter sweep applications Scheduling your application? Communication-aware and application-aware scheduling policies: Take into account the communication pattern of applications when co-allocating Also schedule bandwidth (in DAS3) Support heterogeneity DAS3 DAS2 + DAS3 DAS3 + Grid’ RoGRID Peer-to-peer structure instead of hierarchical grid scheduler

June 28, Outline A Brief Introduction to Grid Computing Koala: Processor and Data Co-Allocation in Grids  The Co-Allocation Problem in Grids  The Koala Design  Koala and the DAS Community  The Future of Koala GrenchMark: Analyzing, Testing, and Comparing Grids  Grid Performance Evaluation Issues  The GrenchMark Architecture  GrenchMark and the DAS Community Take home message

June 28, Grid Performance Evaluation Current Practice Performance Indicators Define my own metrics, or use U and AWT/ART, or both Workload Structure Run my own workload; Mostly all users are created equal assumption (unrealistic) Do not make comparisons (incompatible workloads) No repeatability of results (e.g., background load) Need a common performance evaluation framework for Grid: GrenchMark

June 28, GrenchMark: a Framework for Analyzing, Testing, and Comparing grids What’s in a name? grid benchmark → working towards a generic tool for the whole community: help standardizing the testing procedures, but benchmarks are too early; we use synthetic grid workloads instead What’s it about? A systematic approach to analyzing, testing, and comparing grid settings, based on synthetic workloads A set of metrics and workload units for analyzing grid settings [JSSPP’06] A set of representative grid applications Both real and synthetic Easy-to-use tools to create synthetic grid workloads Flexible, extensible framework

June 28, GrenchMark Overview: Easy to Generate and Run Synthetic Workloads

June 28, … but More Complicated Than You Think Workload structure User-defined and statistical models Dynamic jobs arrival Burstiness and self-similarity Feedback, background load Machine usage assumptions Users, VOs Metrics A(W) Run/Wait/Resp. Time Efficiency, MakeSpan Failure rate [!] (Grid) notions Co-allocation, interactive jobs, malleable, moldable, … Measurement methods Long workloads Saturated / non-saturated system Start-up, production, and cool-down scenarios Scaling workload to system Applications Synthetic Real Workload definition language Base language layer Extended language layer Other Can use the same workload for both simulations and real environments

June 28, GrenchMark and the DAS community Generic Performance Evaluation [IEEE CCGrid’06] Grid System Analysis Performance testing, What-if analysis Functionality Testing in Grid Environments System functionality testing, Periodic testing Comparing Grid Settings Single site vs. co-allocated jobs Releasing the Koala Grid Scheduler on the DAS 5,000+ jobs successfully run (in all workloads); Functionality tests for 3 different job submission modules GrenchMark has been released in Nov 2005 [ grenchmark.st.ewi.tudelft.nl ] grenchmark.st.ewi.tudelft.nl

June 28, GrenchMark: Iterative Research Roadmap Open- GrenchMark Community Effort JSSPP’06 Simple functional system A.Iosup, J.Maassen, R.V.van Nieuwpoort, D.H.J.Epema, Synthetic Grid Workloads with Ibis, KOALA, and GrenchMark, CoreGRID IW, Nov University of Dortmund Complex extensible system A.Iosup, D.H.J.Epema, GrenchMark: A Framework for Analyzing, Testing, and Comparing Grids, IEEE CCGrid'06, May 2006.

June 28, PDS Group/TU Delft - resource and test management in Grid systems Koala: Processor and Data Co-Allocation in Grids [ ] - Grid scheduling with co-allocation and fault-tolerance - many placement policies available - extensible runners system - easy-to-use, flexible - tutorials, on-line documentation, papers GrenchMark: Analyzing, Testing, and Comparing Grids [ grenchmark.st.ewi.tudelft.nl ] - generic tool for the whole community - generates diverse grid workloads - easy-to-use, flexible, portable, extensible, … grenchmark.st.ewi.tudelft.nl Take home message

June 28, Thank you! Questions? Remarks? Observations? All welcome! grenchmark.st.ewi.tudelft.nl/