Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Slides:



Advertisements
Similar presentations
Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)
Advertisements

Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel.
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
7 april SP3.1: High-Performance Distributed Computing The KOALA grid scheduler and the Ibis Java-centric grid middleware Dick Epema Catalin Dumitrescu,
SLA-Oriented Resource Provisioning for Cloud Computing
Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications Rohan Kurian, Pavan Balaji, P. Sadayappan The Ohio State University.
Service Level Agreement based Allocation of Cluster Resources: Handling Penalty to Enhance Utility Chee Shin Yeo and Rajkumar Buyya Grid Computing and.
Workshop on HPC in India Grid Middleware for High Performance Computing Sathish Vadhiyar Grid Applications Research Lab (GARL) Supercomputer Education.
Managing Risk of Inaccurate Runtime Estimates for Deadline Constrained Job Admission Control in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing.
June 1, Inter-Operating Grids through Delegated MatchMaking Alexandru Iosup, Dick Epema PDS Group, TU Delft, NL Todd Tannenbaum, Matt Farrellee,
June 2, GrenchMark : A Framework for Analyzing, Testing, and Comparing Grids CCGrid 2006 A. Iosup, D.H.J. Epema PDS Group, ST/EWI, TU Delft.
June 3, ServMark A Hierarchical Architecture for Testing Grids Santiago, Chile A. Iosup, H. Mohamed, D.H.J. Epema PDS Group, ST/EWI, TU Delft C.
June 3, 2015 Synthetic Grid Workloads with Ibis, K OALA, and GrenchMark CoreGRID Integration Workshop, Pisa A. Iosup, D.H.J. Epema Jason Maassen, Rob van.
The Performance of Bags-Of-Tasks in Large-Scale Distributed Computing Systems Alexandru Iosup, Ozan Sonmez, Shanny Anoep, and Dick Epema ACM/IEEE Int’l.
Inter-Operating Grids through Delegated MatchMaking Alexandru Iosup, Dick Epema, Hashim Mohamed,Mathieu Jan, Ozan Sonmez 3 rd Grid Initiative Summer School,
DAS-3/Grid’5000 meeting: 4th December The KOALA Grid Scheduler over DAS-3 and Grid’5000 Processor and data co-allocation in grids Dick Epema, Alexandru.
1 A Performance Study of Grid Workflow Engines Alexandru Iosup and Dick Epema PDS Group Delft University of Technology The Netherlands Corina Stratan Parallel.
1 Trace-Based Characteristics of Grid Workflows Alexandru Iosup and Dick Epema PDS Group Delft University of Technology The Netherlands Simon Ostermann,
Security-Driven Heuristics and A Fast Genetic Algorithm for Trusted Grid Job Scheduling Shanshan Song, Ricky Kwok, and Kai Hwang University of Southern.
4 december, DAS3-G5K Interconnection Workshop Hosted by the VU (Thilo Kielmann), Amsterdam Dick Epema (TUD) and Franck Cappello (INRIA) Parallel.
June 25, GrenchMark: Synthetic workloads for Grids First Demo at TU Delft A. Iosup, D.H.J. Epema PDS Group, ST/EWI, TU Delft.
June 28, Resource and Test Management in Grids Rapid Prototyping in e-Science VL-e Workshop, Amsterdam, NL Dick Epema, Catalin Dumitrescu, Hashim.
University of Dortmund June 30, On Grid Performance Evaluation using Synthetic Workloads JSSPP 2006 Alexandru Iosup, Dick Epema PDS Group, ST/EWI,
July 13, “How are Real Grids Used?” The Analysis of Four Grid Traces and Its Implications IEEE Grid 2006 Alexandru Iosup, Catalin Dumitrescu, and.
Euro-Par 2008, Las Palmas, 27 August DGSim : Comparing Grid Resource Management Architectures Through Trace-Based Simulation Alexandru Iosup, Ozan.
August 28, Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing Berkeley, CA, USA Alexandru Iosup, Nezih Yigitbasi,
G RID R ESOURCE BROKER FOR SCHEDULING COMPONENT - BASED APPLICATIONS ON DISTRIBUTED RESOURCES Reporter : Yi-Wei Wu.
Scheduling in Heterogeneous Grid Environments: The Effects of Data Migration Leonid Oliker, Hongzhang Shan Future Technology Group Lawrence Berkeley Research.
Integrated Risk Analysis for a Commercial Computing Service Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab. Dept.
XI HE Computing and Information Science Rochester Institute of Technology Rochester, NY USA Rochester Institute of Technology Service.
1 TUD-PDS A Periodic Portfolio Scheduler for Scientific Computing in the Data Center Kefeng Deng, Ruben Verboon, Kaijun Ren, and Alexandru Iosup Parallel.
1 Cloud Computing Research at TU Delft – A. Iosup Alexandru Iosup Parallel and Distributed Systems Group Delft University of Technology The Netherlands.
OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.
1 EuroPar 2009 – POGGI: Puzzle-Based Online Games on Grid Infrastructures POGGI: Puzzle-Based Online Games on Grid Infrastructures Alexandru Iosup Parallel.
Marcos Dias de Assunção 1,2, Alexandre di Costanzo 1 and Rajkumar Buyya 1 1 Department of Computer Science and Software Engineering 2 National ICT Australia.
Resource Provisioning based on Lease Preemption in InterGrid Mohsen Amini Salehi, Bahman Javadi, Rajkumar Buyya Cloud Computing and Distributed Systems.
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
Meta Scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References” slide.
Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.
Chapter 3 System Performance and Models. 2 Systems and Models The concept of modeling in the study of the dynamic behavior of simple system is be able.
1 Challenge the future KOALA-C: A Task Allocator for Integrated Multicluster and Multicloud Environments Presenter: Lipu Fei Authors: Lipu Fei, Bogdan.
Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.
1 ROIA 2009 – CAMEO: Continuous Analytics for Massively Multiplayer Online Games CAMEO: Continuous Analytics for Massively Multiplayer Online Games Alexandru.
October 23, Grid Computing: From Old Traces to New Applications Fribourg, Switzerland Alexandru Iosup, Ozan Sonmez, Nezih Yigitbasi, Hashim Mohamed,
The Owner Share scheduler for a distributed system 2009 International Conference on Parallel Processing Workshops Reporter: 李長霖.
BOF: Megajobs Gracie: Grid Resource Virtualization and Customization Infrastructure How to execute hundreds of thousands tasks concurrently on distributed.
Power-Aware Parallel Job Scheduling
Review of Condor,SGE,LSF,PBS
Evolution of a High Performance Computing and Monitoring system onto the GRID for High Energy Experiments T.L. Hsieh, S. Hou, P.K. Teng Academia Sinica,
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.
A PanDA Backend for the Ganga Analysis Interface J. Elmsheuser 1, D. Liko 2, T. Maeno 3, P. Nilsson 4, D.C. Vanderster 5, T. Wenaus 3, R. Walker 1 1: Ludwig-Maximilians-Universität.
XI HE Computing and Information Science Rochester Institute of Technology Rochester, NY USA Rochester Institute of Technology Service.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
CMS Usage of the Open Science Grid and the US Tier-2 Centers Ajit Mohapatra, University of Wisconsin, Madison (On Behalf of CMS Offline and Computing Projects)
DIRAC Pilot Jobs A. Casajus, R. Graciani, A. Tsaregorodtsev for the LHCb DIRAC team Pilot Framework and the DIRAC WMS DIRAC Workload Management System.
Purdue RP Highlights TeraGrid Round Table May 20, 2010 Preston Smith Manager - HPC Grid Systems Rosen Center for Advanced Computing Purdue University.
WMS baseline issues in Atlas Miguel Branco Alessandro De Salvo Outline  The Atlas Production System  WMS baseline issues in Atlas.
Accounting John Gordon WLC Workshop 2016, Lisbon.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
On Dynamic Resource Availability in Grids
Basic Grid Projects – Condor (Part I)
Resource and Test Management in Grids
The Design of a Grid Computing System for Drug Discovery and Design
ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC
Mihai Neacşu, BSc. Prof.dr.eng. Alexandru Iosup Ir. Laurens Versluis
Experiences in Running Workloads over OSG/Grid3
Presentation transcript:

Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick Epema PDS Group Delft University of Technology The Netherlands * : now postdoc LRI/INRIA Futurs, Orsay (Paris South), France

Euro-Par 2007, Rennes, 29th August 2 Outline Why looking at groups of jobs? Grid traces and environment summary Definitions of groups of jobs The characteristics of jobs grouping Workload-level analysis Group-level analysis Job-level analysis Conclusion and future work

Euro-Par 2007, Rennes, 29th August 3 Why looking at groups of jobs? Current grids run almost exclusive single-node jobs [Grid2006] Traces analysis: LCG, Grid3, TeraGrid, DAS-2 How jobs are related then? What is their structure? Batches of identical jobs? Something else? No such analysis using long-term data from production and research grid environment No analysis of the impact of groups of jobs on the performance of grids

Euro-Par 2007, Rennes, 29th August 4 Our research questions What are the dependencies among the jobs submitted by a single user? What is the physical structure of such groupings? What is the impact of the job groupings on the performance of grids?

Euro-Par 2007, Rennes, 29th August 5 Grid traces: Grid’5000 (1/3) Experimental platform Grid’5000: 9 sites, 15 clusters All clusters managed by OAR Trace period: 05/ /2006 CPUs: ~ 2500 Jobs: 951 K Users: 473 Groups: 10 Consumed CPU time: 651 years

Euro-Par 2007, Rennes, 29th August 6 Grid traces: NorduGrid (2/3) Large scale production grid NorduGrid: ~75 sites Handled via ARC middleware Advanced Resource Connector Trace period: 05/ /2006 CPUs: ~ 2000 Jobs: 781 K Users: 387 Groups: 106 Consumed CPU time: 2443 years

Euro-Par 2007, Rennes, 29th August 7 Grid traces: GLOW (3/3) Grid Laboratory Of Wisconsin Campus wide distributed computing environment Condor based Trace period: 09/ /2007 CPUs: ~ 1400 Jobs: 216 K Users: 18 Groups: 1 Consumed CPU time: 55 years

Euro-Par 2007, Rennes, 29th August 8 Grid traces summary Period05/ /200605/ /200609/ /2007 Sites15~751 CPUs~2500~2000~1400 Jobs951 K781 K216 K Groups Users Consumed CPU time 651 years2443 years55 years

Euro-Par 2007, Rennes, 29th August 9 Groups of jobs: definitions (1/2) Batch submission Maximal contiguous subsequence G of such that for any two successive jobs J, J’ in G Parameter Sweep Application (PSA) Batch submission + jobs execute the same application

Euro-Par 2007, Rennes, 29th August 10 Groups of jobs: definitions (2/2) In this talk, we focus on batch submissions

Euro-Par 2007, Rennes, 29th August 11 Characteristics of jobs groupings In our analysis, = 120 seconds

Euro-Par 2007, Rennes, 29th August 12 Workload-level analysis Grid’5000NorduGridGLOW Submissions26k50k13k Jobs808k (951k)738k (781k)205k (216k) CPU time193y (651y)2192y (2443y)53y (55y) Batches Continued NorduGrid & GLOW: identical to batches Grid’5000: 14k sub, 910k jobs, 462y Bursty: less submissions, more jobs

Euro-Par 2007, Rennes, 29th August 13 Group-level analysis: size of batches 75% of batches are size (Grid ’ 5000 and NorduGrid) or <10 (GLOW) Average: 31+/-110 (Grid ’ 5000), 15+/-33 (NorduGrid) and 15+/-38 (GLOW) Heavy-tail distribution

Euro-Par 2007, Rennes, 29th August 14 Group-level analysis: inter-arrival time (seconds) Expected high inter-arrival time for batches 50% of the values are between 400 and 700 seconds Reminder: = 120 seconds

Euro-Par 2007, Rennes, 29th August 15 Group-level analysis: duration (seconds) Duration of batches are higher than for single jobs For NorduGrid, average duration of batches is 1.5 day vs. 1 day for single jobs

Euro-Par 2007, Rennes, 29th August 16 Group-level analysis: consumed CPU time (KCPUs) Consumed CPU time is much higher for batches than for single jobs!

Euro-Par 2007, Rennes, 29th August 17 Job-level analysis: run time (seconds) Average run time for batches Grid’5000: 0.66+/-6.65 days GLOW: 1.04+/-3.18 days NorduGrid: 2.27+/-5.59 days

Euro-Par 2007, Rennes, 29th August 18 Job-level analysis: wait time (seconds) NorduGrid: no wait time information in the trace Average wait times of batches are higher than The runtime of batches The wait time of single jobs

Euro-Par 2007, Rennes, 29th August 19 Job-level analysis: consumed CPU time (KCPUs) No clear distinction between batches and single jobs

Euro-Par 2007, Rennes, 29th August 20 Other analyses Do parallel jobs inside batches exists? Average parallelism: 1+/-1 (Grid’5000), 2+/-7 (NorduGrid) and 1 (GLOW) Grid’5000: 37% of batches are of size 2, 9% of size >2, max. = 325 To what extend batches are PSAs? In Grid’5000, 75% of batches are PSAs PSAs compared to batches: Increased grouped size by 9 in average Average duration time divided by 5.7

Euro-Par 2007, Rennes, 29th August 21 Performance impact of grouped submissions Batches display an high AIT value Over 4000% of the ART! Research direction for designing scheduling policies for batches: minimization of the AIT of batches Performances metrics Group runtime (RT) Group duration (DT) Group idle time: IT = DT - RT BatchesSingle jobs ART (s)AIT (s)ART (s)AIT (s) Grid’

Euro-Par 2007, Rennes, 29th August 22 Conclusion & future work Formally defined 3 types of groups of jobs Batch (and PSAs), continued and bursty Analysis of 3 long-term traces from large and different platforms Up to 96% of CPU time consumed by batch submissions Performance analysis of batches compared to single jobs Future work Deeper analysis (Grid Workloads Archives) Research direction: minimization of idle time in groups Trace driven simulations Dynamic resource availability [Grid2007]

Euro-Par 2007, Rennes, 29th August 23 Thank you! Questions? Remarks? Observations? Help building our community’s Grid Workloads Archive: