June 2, 2015 1 GrenchMark : A Framework for Analyzing, Testing, and Comparing Grids CCGrid 2006 A. Iosup, D.H.J. Epema PDS Group, ST/EWI, TU Delft.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

7 april SP3.1: High-Performance Distributed Computing The KOALA grid scheduler and the Ibis Java-centric grid middleware Dick Epema Catalin Dumitrescu,
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
CHESS: A Systematic Testing Tool for Concurrent Software CSCI6900 George.
June 1, Inter-Operating Grids through Delegated MatchMaking Alexandru Iosup, Dick Epema PDS Group, TU Delft, NL Todd Tannenbaum, Matt Farrellee,
June 1, GrenchMark : Towards a Generic Framework for Analyzing, Testing, and Comparing Grids ASCI Conference 2006 A. Iosup, D.H.J. Epema PDS Group,
Cracow Grid Workshop, November 5-6, 2001 Towards the CrossGrid Architecture Marian Bubak, Marek Garbacz, Maciej Malawski, and Katarzyna Zając.
June 3, ServMark A Hierarchical Architecture for Testing Grids Santiago, Chile A. Iosup, H. Mohamed, D.H.J. Epema PDS Group, ST/EWI, TU Delft C.
June 3, 2015 Synthetic Grid Workloads with Ibis, K OALA, and GrenchMark CoreGRID Integration Workshop, Pisa A. Iosup, D.H.J. Epema Jason Maassen, Rob van.
Universität Dortmund Robotics Research Institute Information Technology Section Grid Metaschedulers An Overview and Up-to-date Solutions Christian.
Project 4 U-Pick – A Project of Your Own Design Proposal Due: April 14 th (earlier ok) Project Due: April 25 th.
A Grid Parallel Application Framework Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte.
Inter-Operating Grids through Delegated MatchMaking Alexandru Iosup, Dick Epema, Hashim Mohamed,Mathieu Jan, Ozan Sonmez 3 rd Grid Initiative Summer School,
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
DAS-3/Grid’5000 meeting: 4th December The KOALA Grid Scheduler over DAS-3 and Grid’5000 Processor and data co-allocation in grids Dick Epema, Alexandru.
1 A Performance Study of Grid Workflow Engines Alexandru Iosup and Dick Epema PDS Group Delft University of Technology The Netherlands Corina Stratan Parallel.
1 Trace-Based Characteristics of Grid Workflows Alexandru Iosup and Dick Epema PDS Group Delft University of Technology The Netherlands Simon Ostermann,
June 25, GrenchMark: A synthetic workload generator for Grids KOALA Workshop A. Iosup, H. Mohamed, D.H.J. Epema PDS Group, ST/EWI, TU Delft.
June 25, GrenchMark: Synthetic workloads for Grids First Demo at TU Delft A. Iosup, D.H.J. Epema PDS Group, ST/EWI, TU Delft.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
GHS: A Performance Prediction and Task Scheduling System for Grid Computing Xian-He Sun Department of Computer Science Illinois Institute of Technology.
June 28, Resource and Test Management in Grids Rapid Prototyping in e-Science VL-e Workshop, Amsterdam, NL Dick Epema, Catalin Dumitrescu, Hashim.
June 29, Grenchmark: A workload generator for Grid schedulers First Demo at TU Delft A. Iosup, D.H.J. Epema PDS Group, ST/EWI, TU Delft.
4 december, The Distributed ASCI Supercomputer The third generation Dick Epema (TUD) (with many slides from Henri Bal) Parallel and Distributed.
Workload Management Massimo Sgaravatto INFN Padova.
University of Dortmund June 30, On Grid Performance Evaluation using Synthetic Workloads JSSPP 2006 Alexandru Iosup, Dick Epema PDS Group, ST/EWI,
July 13, GrenchMark: A workload generator for Grids Demo at TU Delft A. Iosup, D.H.J. Epema PDS Group, ST/EWI, TU Delft.
July 13, “How are Real Grids Used?” The Analysis of Four Grid Traces and Its Implications IEEE Grid 2006 Alexandru Iosup, Catalin Dumitrescu, and.
Euro-Par 2008, Las Palmas, 27 August DGSim : Comparing Grid Resource Management Architectures Through Trace-Based Simulation Alexandru Iosup, Ozan.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.
KARMA with ProActive Parallel Suite 12/01/2009 Air France, Sophia Antipolis Solutions and Services for Accelerating your Applications.
1 Cloud Computing Research at TU Delft – A. Iosup Alexandru Iosup Parallel and Distributed Systems Group Delft University of Technology The Netherlands.
1 EuroPar 2009 – POGGI: Puzzle-Based Online Games on Grid Infrastructures POGGI: Puzzle-Based Online Games on Grid Infrastructures Alexandru Iosup Parallel.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.
EMI INFSO-RI SA2 - Quality Assurance Alberto Aimar (CERN) SA2 Leader EMI First EC Review 22 June 2011, Brussels.
1 Performance Evaluation of Computer Systems and Networks Introduction, Outlines, Class Policy Instructor: A. Ghasemi Many thanks to Dr. Behzad Akbari.
Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov.
Chapter 3 System Performance and Models. 2 Systems and Models The concept of modeling in the study of the dynamic behavior of simple system is be able.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
LCG Middleware Testing in 2005 and Future Plans E.Slabospitskaya, IHEP, Russia CERN-Russia Joint Working Group on LHC Computing March, 6, 2006.
Workflow Early Start Pattern and Future's Update Strategies in ProActive Environment E. Zimeo, N. Ranaldo, G. Tretola University of Sannio - Italy.
1 Challenge the future KOALA-C: A Task Allocator for Integrated Multicluster and Multicloud Environments Presenter: Lipu Fei Authors: Lipu Fei, Bogdan.
BOF: Megajobs Gracie: Grid Resource Virtualization and Customization Infrastructure How to execute hundreds of thousands tasks concurrently on distributed.
Common Set of Tools for Assimilation of Data COSTA Data Assimilation Summer School, Sibiu, 6 th August 2009 COSTA An Introduction Nils van Velzen
The european ITM Task Force data structure F. Imbeaux.
OMIS Approach to Grid Application Monitoring Bartosz Baliś Marian Bubak Włodzimierz Funika Roland Wismueller.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
George Goulas, Christos Gogos, Panayiotis Alefragis, Efthymios Housos Computer Systems Laboratory, Electrical & Computer Engineering Dept., University.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
HPC HPC-5 Systems Integration High Performance Computing 1 Application Resilience: Making Progress in Spite of Failure Nathan A. DeBardeleben and John.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Use of Performance Prediction Techniques for Grid Management Junwei Cao University of Warwick April 2002.
1 An unattended, fault-tolerant approach for the execution of distributed applications Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT Madrid, Spain.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
MaGate Experiments on Scenarios GridGroup EIF, Feb 5th, 2009 Ye HUANG Pervasive Artificial Intelligence Group, Dept of Informatics, University of Fribourg,
ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.
GWE Core Grid Wizard Enterprise (
On Dynamic Resource Availability in Grids
Resource and Test Management in Grids
Overview of Workflows: Why Use Them?
Presentation transcript:

June 2, GrenchMark : A Framework for Analyzing, Testing, and Comparing Grids CCGrid 2006 A. Iosup, D.H.J. Epema PDS Group, ST/EWI, TU Delft

June 2, Outline Introduction and Motivation The GrenchMark Framework Past and Current Experience with GrenchMark A GrenchMark Success Story Future Work Conclusions

June 2, The Generic Problem of Analyzing, Testing, and Comparing Grids Use cases for automatically analyzing, testing, and comparing Grids Comparisons for system design and procurement Functionality testing and system tuning Performance testing/analysis of grid applications … For grids, this problem is hard ! Testing in real environments is difficult Grids change rapidly Validity of tests …

June 2, A Generic Solution to Analyzing, Testing, and Comparing Grids “ Generate and run synthetic grid workloads, based on real and synthetic applications “ Current alternatives (not covering all problems) Benchmarking with real/synthetic applications (representative?) User-defined test management (statistically sound?) Advantages of using synthetic grid workloads Statistically sound composition of benchmarks Statistically sound test management Generic: cover the use cases’ broad spectrum (to be shown)

June 2, Outline Introduction and Motivation The GrenchMark Framework Past and Current Experience with GrenchMark A GrenchMark Success Story Future Work Conclusions

June 2, GrenchMark: a Framework for Analyzing, Testing, and Comparing grids What’s in a name? grid benchmark → working towards a generic tool for the whole community: help standardizing the testing procedures, but benchmarks are too early; we use synthetic grid workloads instead What’s it about? A systematic approach to analyzing, testing, and comparing grid settings, based on synthetic workloads A set of metrics for analyzing grid settings A set of representative grid applications Both real and synthetic Easy-to-use tools to create synthetic grid workloads Flexible, extensible framework

June 2, GrenchMark: Iterative Research Roadmap

June 2, GrenchMark: Iterative Research Roadmap Simple functional system A.Iosup, J.Maassen, R.V.van Nieuwpoort, D.H.J.Epema, Synthetic Grid Workloads with Ibis, KOALA, and GrenchMark, CoreGRID IW, Nov 2005.

June 2, GrenchMark: Iterative Research Roadmap Open- GrenchMark Community Effort Complex extensible system This work

June 2, GrenchMark Overview: Generate and Run Synthetic Workloads

June 2, GrenchMark Overview: Easy to Generate and Run Synthetic Workloads

June 2, … but More Complicated Than You Think Workload structure User-defined and statistical models Dynamic jobs arrival Burstiness and self-similarity Feedback, background load Machine usage assumptions Users, VOs Metrics A(W) Run/Wait/Resp. Time Efficiency, MakeSpan Failure rate [!] (Grid) notions Co-allocation, interactive jobs, malleable, moldable, … Measurement methods Long workloads Saturated / non-saturated system Start-up, production, and cool-down scenarios Scaling workload to system Applications Synthetic Real Workload definition language Base language layer Extended language layer Other Can use the same workload for both simulations and real environments

June 2, GrenchMark Overview: Unitary and Composite Applications Composite applications Bag of tasks Chain of jobs Direct Acyclic Graph-based (Standard Task Graph Archive) Unitary applications sequential, MPI, Java RMI, Ibis, …

June 2, GrenchMark Overview: Workload Description Files Format: Combining four workloads into one Number of jobs Composition and application types Co-allocation and number of components Inter-arrival and start time Language extensions

June 2, Using GrenchMark: Grid System Analysis Performance testing: test the performance of an application (for sequential, MPI, Ibis applications) Report runtimes, waiting times, grid middleware overhead Automatic results analysis What-if analysis: evaluate potential situations System change Grid inter-operability Special situations: spikes in demand

June 2, Using GrenchMark: Functionality Testing in Grid Environments System functionality testing: show the ability of the system to run various types of applications Report failure rates [ arguably, functionality in grids is even more important than performance !  10% job failure rate in a controlled system like the DAS ] Periodic system testing: evaluate the current state of the grid Replay workloads

June 2, Using GrenchMark: Comparing Grid Settings Single-site vs. co-allocated jobs: compare the success rate of single-site and co-allocated jobs, in a system without reservation capabilities Single-site jobs 20% better vs. small co-allocated jobs (<32 CPUs), 30% better vs. large co-allocated jobs [setting and workload-dependent !] Unitary vs. composite jobs: compare the success rate of unitary and composite jobs, with and without failure handling mechanisms Both 100% with simple retry mechanism [setting and workload-dependent !]

June 2, A GrenchMark Success Story: Releasing the Koala Grid Scheduler on the DAS Koala [ ] Grid Scheduler with co-allocation capabilities DAS: The Dutch Grid, ~200 researchers Initially Koala, a tested (!) scheduler, pre-release version Test specifics 3 different job submission modules Workloads with different jobs requirements, inter-arrival rates, co-allocated v. single site jobs… Evaluate: job success rate, Koala overhead and bottlenecks Results 5,000+ jobs successfully run (all workloads); functionality tests 2 major bugs first day, 10+ bugs overall (all fixed) KOALA is now officially released on the DAS (full credit to KOALA developers, 10x for testing with GrenchMark)

June 2, A.Iosup, D.H.J.Epema (TU Delft), C. Franke, A. Papaspyrou, L. Schley, B. Song, R. Yahyapour (U Dortmund), On modeling synthetic workloads for Grid performance evaluation, 12th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), held in conjunction with SIGMETRICS 2006, Saint Malo, France, June 2006 (accepted). GrenchMark’s Current Status: pre-”Open-GrenchMark” Already done in Python [ Workload Generator Generic Workload Submitter (Koala, Globus GRAM, option to extend for JSDL, Condor, PBS, LSF, SGE, …) Applications Unitary, 3 types: sequential, MPI, Ibis (Java) +35 real and synthetic applications Composite applications: DAG-based Extending modeling capabilities

June 2, A. Iosup, C. Dumitrescu, D.H.J. Epema (TU Delft), H. Li, L. Wolters (U Leiden), How are Real Grids Used? The Analysis of Four Grid Traces and Its Implications, (submitted). Towards Open-GrenchMark: Grid traces, Simulators, Benchmarks Distributed testing Integrate with DiPerF (C. Dumitrescu, I. Raicu, M. Ripeanu) Grid traces analysis Automatic tools for grid traces analysis Use in conjunction with simulators Ability to generate workloads which can be used in simulated environments (e.g., GangSim, GridSim, …) Grid benchmarks Analyze the requirements for domain-specific grid benchmarks

June 2, Conclusion GrenchMark generates diverse grid workloads easy-to-use, flexible, portable, extensible, … Experience used GrenchMark to test KOALA’s functionality and performance. used GrenchMark to analyze, test, and compare grid settings. 15,000+ jobs generated and run … and counting. (more) advertisement Have specific grid setting you would like to test? Test with GrenchMark!

June 2, Thank you! Questions? Remarks? Observations? All welcome! GrenchMark [10x Paulo] Alexandru IOSUP TU Delft [google: “iosup”] Many thanks to Hashim Mohamed (Koala), Jason Maassen and Rob van Nieuwpoort (Ibis).

June 2,

June 2, Outline Introduction The GrenchMark framework Experience with GrenchMark Extending GrenchMark Conclusions [here]

June 2, Representative Grid applications (1/4) Unitary applications Just one scheduling unit (otherwise recursive definition) Examples: Sequential, MPI, Java RMI, Ibis, … Composite applications Composed of several unitary or composite applications Examples: Parameter sweeps, chains of tasks, DAGs, workflows, …

June 2, Representative Grid applications (2/4) Unitary: synthetic sequential [I] read I1 data elements from stdin [O] write O1 data elements to stdout for each N steps (i) # superstep [I] read I2 data elements from stdin for each M memory locations (j) # computation step [M] get item j of size S [P] compute C computation units per memory item (fmul), and store results into temp memory location [M] put values from temp to location j [O] write O2 data elements to stdout [O] write O3 data elements to stdout > System reqs Processor (float*N*S*C) Memory (float*(M+1)*S) N+1:I1,N*I2 stdin stdout N+2:O1,N*O2,O3 Also version with computation and I/O

June 2, Representative Grid applications (2/4) Unitary: synthetic parallel [I] read I1 data from in$ProcID [O] write O1 data to out$ProcID for each N steps (i) # superstep [B] synchronize with barrier [I] read I2 data from in$ProcID for each M memory locations (j) [M] get item j of size S [P] compute C fmul/mem [M] put values from temp to j [C] communicate to all other processes X values and receive X values from every other processor [O] write O2 data elements to out$ProcID [O] write O3 data elements to out$ProcID > Machine i Processor i k (float*N*S*C) Memory (float*N*2S) N+1: I1,N*I2 app-input.iapp-output.i N+2: O1,N*O2,O3 Process i

June 2, Representative Grid applications (3/4) Unitary: Ibis jobs No modeling, just real applications [ ] physical simulation parallel rendering computational mathematics state space search bioinformatics data compression grid methods optimization

June 2, Representative Grid applications (3/4) Composite: DAG-based DAG-based applications Real DAG Chain of tools Try to model real or predicted (use) cases Input Output User task Linker Identity (one task’s output = other’s input, unmodified) App1 > Linker1 > App2 > Final result > out_1-2.dat param1.in out_1-1.dat huge-data.out perf2.dat param2.insome-list.in > out2.res l1p.dat perf1.dat