July 13, 2015 1 “How are Real Grids Used?” The Analysis of Four Grid Traces and Its Implications IEEE Grid 2006 Alexandru Iosup, Catalin Dumitrescu, and.

Slides:



Advertisements
Similar presentations
Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel.
Advertisements

Evaluating the Cost-Benefit of Using Cloud Computing to Extend the Capacity of Clusters Presenter: Xiaoyu Sun.
7 april SP3.1: High-Performance Distributed Computing The KOALA grid scheduler and the Ibis Java-centric grid middleware Dick Epema Catalin Dumitrescu,
Introduction Background Knowledge Workload Modeling Related Work Conclusion & Future Work Modeling Many-Task Computing Workloads on a Petaflop Blue Gene.
Copyright © 2005 Department of Computer Science CPSC 641 Winter PERFORMANCE EVALUATION Often in Computer Science you need to: – demonstrate that.
Polish Infrastructure for Supporting Computational Science in the European Research Space EUROPEAN UNION Services and Operations in Polish NGI M. Radecki,
JSSPP-11, Boston, MA June 19, Pitfalls in Parallel Job Scheduling Evaluation Designing Parallel Operating Systems using Modern Interconnects Pitfalls.
June 1, Inter-Operating Grids through Delegated MatchMaking Alexandru Iosup, Dick Epema PDS Group, TU Delft, NL Todd Tannenbaum, Matt Farrellee,
June 1, GrenchMark : Towards a Generic Framework for Analyzing, Testing, and Comparing Grids ASCI Conference 2006 A. Iosup, D.H.J. Epema PDS Group,
June 2, GrenchMark : A Framework for Analyzing, Testing, and Comparing Grids CCGrid 2006 A. Iosup, D.H.J. Epema PDS Group, ST/EWI, TU Delft.
Efficient Response Time Predictions by Exploiting Application and Resource State Similarities Hui Li, David Groep, Lex Wolters Nov 14th, 2005.
June 3, ServMark A Hierarchical Architecture for Testing Grids Santiago, Chile A. Iosup, H. Mohamed, D.H.J. Epema PDS Group, ST/EWI, TU Delft C.
June 3, 2015 Synthetic Grid Workloads with Ibis, K OALA, and GrenchMark CoreGRID Integration Workshop, Pisa A. Iosup, D.H.J. Epema Jason Maassen, Rob van.
The Performance of Bags-Of-Tasks in Large-Scale Distributed Computing Systems Alexandru Iosup, Ozan Sonmez, Shanny Anoep, and Dick Epema ACM/IEEE Int’l.
Inter-Operating Grids through Delegated MatchMaking Alexandru Iosup, Dick Epema, Hashim Mohamed,Mathieu Jan, Ozan Sonmez 3 rd Grid Initiative Summer School,
1 A Performance Study of Grid Workflow Engines Alexandru Iosup and Dick Epema PDS Group Delft University of Technology The Netherlands Corina Stratan Parallel.
SWE Introduction to Software Engineering
K. Salah1 A Methodology for Successful Voice over IP Deployment An Approved Research Proposal Submitted To DEANSHIP OF SCIENTIFIC RESEARCH KFUPM Research.
1 Trace-Based Characteristics of Grid Workflows Alexandru Iosup and Dick Epema PDS Group Delft University of Technology The Netherlands Simon Ostermann,
Security-Driven Heuristics and A Fast Genetic Algorithm for Trusted Grid Job Scheduling Shanshan Song, Ricky Kwok, and Kai Hwang University of Southern.
June 25, GrenchMark: A synthetic workload generator for Grids KOALA Workshop A. Iosup, H. Mohamed, D.H.J. Epema PDS Group, ST/EWI, TU Delft.
June 25, GrenchMark: Synthetic workloads for Grids First Demo at TU Delft A. Iosup, D.H.J. Epema PDS Group, ST/EWI, TU Delft.
1 PERFORMANCE EVALUATION H Often in Computer Science you need to: – demonstrate that a new concept, technique, or algorithm is feasible –demonstrate that.
June 28, Resource and Test Management in Grids Rapid Prototyping in e-Science VL-e Workshop, Amsterdam, NL Dick Epema, Catalin Dumitrescu, Hashim.
University of Dortmund June 30, On Grid Performance Evaluation using Synthetic Workloads JSSPP 2006 Alexandru Iosup, Dick Epema PDS Group, ST/EWI,
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
Euro-Par 2008, Las Palmas, 27 August DGSim : Comparing Grid Resource Management Architectures Through Trace-Based Simulation Alexandru Iosup, Ozan.
1 Efficient Management of Data Center Resources for Massively Multiplayer Online Games V. Nae, A. Iosup, S. Podlipnig, R. Prodan, D. Epema, T. Fahringer,
Simulations and Supply Chain Management David Sparling Court of Experts September 6, 2002 University of Guelph.
August 28, Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing Berkeley, CA, USA Alexandru Iosup, Nezih Yigitbasi,
Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.
1 TUD-PDS A Periodic Portfolio Scheduler for Scientific Computing in the Data Center Kefeng Deng, Ruben Verboon, Kaijun Ren, and Alexandru Iosup Parallel.
1 Cloud Computing Research at TU Delft – A. Iosup Alexandru Iosup Parallel and Distributed Systems Group Delft University of Technology The Netherlands.
DISTRIBUTED COMPUTING
Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis [1] 4/24/2014 Presented by: Rakesh Kumar [1 ]
Wireless Networks Breakout Session Summary September 21, 2012.
Meta Scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References” slide.
1 Time & Cost Sensitive Data-Intensive Computing on Hybrid Clouds Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The.
Hotspot Detection in a Service Oriented Architecture Pranay Anchuri,
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
1 Challenge the future KOALA-C: A Task Allocator for Integrated Multicluster and Multicloud Environments Presenter: Lipu Fei Authors: Lipu Fei, Bogdan.
Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.
1 ROIA 2009 – CAMEO: Continuous Analytics for Massively Multiplayer Online Games CAMEO: Continuous Analytics for Massively Multiplayer Online Games Alexandru.
:: IDC 2009 :: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: 06/10/2009 :::: 1 :: Workflows and HPC? :: The relation between workflows.
Experiments in computer science Emmanuel Jeannot INRIA – LORIA Aleae Kick-off meeting April 1st 2009.
How to Read Research Papers? Xiao Qin Department of Computer Science and Software Engineering Auburn University
ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.
MC 2 : Map Concurrency Characterization for MapReduce on the Cloud Mohammad Hammoud and Majd Sakr 1.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
Rassul Ayani 1 Performance of parallel and distributed systems  What is the purpose of measurement?  To evaluate a system (or an architecture)  To compare.
Design time and Run time Chris, Andrea, Martina, Henry, Antonio, Ole, Philippe, Hartmut, Anne, Jeff, Felix, Sebastian, Kurt.
© 2014 IBM Corporation Does your Cloud have a Silver Lining ? The adoption of Cloud in Grid Operations of Electric Distribution Utilities Kieran McLoughlin.
SOFTWARE PROCESS IMPROVEMENT SHARATH CHANDAR REDDY ALETI CSC 532 TERM PAPER.
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
HPC HPC-5 Systems Integration High Performance Computing 1 Application Resilience: Making Progress in Spite of Failure Nathan A. DeBardeleben and John.
Grade Point Average, among working and non-working students Group 4 ●Bre Patroske ●Marcello Gill ●Nga Wargin.
Evolving Security in WLCG Ian Collier, STFC Rutherford Appleton Laboratory Group info (if required) 1 st February 2016, WLCG Workshop Lisbon.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
The ATLAS Computing & Analysis Model Roger Jones Lancaster University ATLAS UK 06 IPPP, 20/9/2006.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Performance Evaluation of Adaptive MPI
A Grid Research Toolbox
On Dynamic Resource Availability in Grids
Resource and Test Management in Grids
Gerald Dyer, Jr., MPH October 20, 2016
A benchmark for Minecraft-like services
Hawk: Hybrid Datacenter Scheduling
The Performance of Big Data Workloads in Cloud Datacenters
Presentation transcript:

July 13, “How are Real Grids Used?” The Analysis of Four Grid Traces and Its Implications IEEE Grid 2006 Alexandru Iosup, Catalin Dumitrescu, and Dick H.J. Epema PDS Group, EEMCS, TU Delft Hui Li and Lex Wolters LIACS, U. Leiden The Analysis of Four Grid Traces and Its Implications “How are Real Grids Used?”

July 13, Outline Why “How are Real Grids Used?”?  What do we expect from Grids?  Grid Computing works!  Our approach Grid Traces Summary Towards “How are Real Grids Used?”?  System-wide characteristics  VO, group, and user characteristics  Evolution of grid environments over time  Grid systems performance The Answer to “How are Real Grids Used?” Applications, Future Work, Conclusion

July 13, Outline What do we expect from Grids? The Analysis Answers for “How are Real Grids Used?” Final Words

July 13, What do we expect from Grids?

July 13, What do we expect from grids? (1/3) Goal: Grid systems will be dynamic, heterogeneous, very large-scale (world), and have no central administration The all users are interested assumption: Grids support many VOs, many users, and they all use the grid intensively The 0-day learning curve assumption: a grid must support many different projects in a way that facilitates an immediate jump to productivity The theory sounds good, but how about the practice? Let’s look at Real Grid Traces!

July 13, What do we expect from grids? (2/3) Grids vs. Parallel Production Envs. The HPC successor assumption: the Grid’s typical user comes from the HPC world, and the typical workload can be described with well-known models of parallel production environments workloads  We will have from the beginning lots of parallel jobs  We will have from the beginning lots of power-of-2 sized jobs  We will handle from the beginning tightly-coupled parallel jobs The infinite power assumption: the Grid can deliver any computing power demand, and in particular will offer at least computing power comparable to traditional Parallel Production Environments Promising in theory, but how about in practice? Let’s look at Real Grid Traces!

July 13, What do we expect from grids? (3/3) Grid Computing Works! The Power of Sums is Greater Than the Sum of Powers assumption: by coupling individual environments the Grid will provide better performance than by using the individual environments alone  Higher throughput  Shorter wait times  Shorter response times  Same or lower jobs slowdown There is a need for the Power of Sums assumption: institutions have the will and the need to couple their environments into a larger, grid-based, environment I’m sure it’s true, but what happens in practice? Let’s look at Real Grid Traces!

July 13, The Analysis

July 13, The Grid Traces: LCG, Grid3, TeraGrid, the DAS Production Grids: LCG, Grid3, TeraGrid Academic Grid: the DAS (two overlapping traces: local, and shared as Grid platform) Features: long traces (6+ months), active environments (500+K jobs per trace, 100s of users)

July 13, System utilization is on average 60-80% for production Grids, and <20% for academic Grids Average job size is 1 (that is, there are no [!] tightly- coupled, only conveniently parallel jobs) Trace Analysis: System-Wide Characteristics

July 13, Trace Analysis: VO, Group, and User Characteristics Top 2-5 groups/users dominate the workload Top groups/users are constant submitters The week’s top group/user is not always the same

July 13, Trace Analysis: Evolution of Grid Environments Everything evolves! The infrastructure, the projects, and the users Similar submission patterns Stages learning automated tool production phase cool-down phase

July 13, Trace Analysis: The Performance of Real Grid Systems With the notable exceptions of average waiting time, and of the occurrence of errors, the studied grid systems perform in normal ranges predicted by simulation work [Shan et al.’03, Ernemann et al.’04]. The AWT is much higher in real grids than expected (see paper) The occurrence of errors in real grids seem much higher than expected (future work to prove) Running jobsWaiting jobs

July 13, Trace Analysis: Grids vs. Parallel Production Systems From A. Iosup, D.H.J. Epema, C. Franke, A. Papaspyrou, L. Schley, B. Song, R. Yahyapour, On Grid Performance Evaluation using Synthetic Workloads, JSSPP’06 : Similar CPUseconds/Year, Higher job arrivals spikes (5x) Grid Systems

July 13, Answers for “How are Real Grids Used?”

July 13, What did we expect from grids? (1/3) The all users are interested assumption: Grids support many VOs, many users, and they all use the grid intensively The 0-day learning curve assumption: a grid must support many different projects in a way that facilitates an immediate jump to productivity

July 13, What did we expect from grids? (1/3) … and what have we observed The all users are interested assumption: Grids support many VOs, many users, and they all use the grid intensively (top 2-5 groups/users dominate the workload) The 0-day learning curve assumption: a grid must support many different projects in a way that facilitates an immediate jump to productivity

July 13, What did we expect from grids? (1/3) … and what have we observed The all users are interested assumption: Grids support many VOs, many users, and they all use the grid intensively (top 2-5 groups/users dominate the workload) The 0-day learning curve assumption: a grid must support many different projects in a way that facilitates an immediate jump to productivity (learning curves of up to 60 days, 100 days for an automated tool development)

July 13, What did we expect from grids? (2/3) Grids vs. Parallel Production Envs. The HPC successor assumption: the Grid’s typical user comes from the HPC world, and the typical workload can be described with well-known models of parallel production environments workloads  We will have from the beginning lots of parallel jobs  We will have from the beginning lots of power-of-2 sized jobs  We will handle from the beginning tightly-coupled parallel jobs The infinite power assumption: the Grid can deliver any computing power demand, and in particular will offer at least computing power comparable to traditional Parallel Production Environments

July 13, Grids vs. Parallel Production Envs. … and what have we observed The HPC successor assumption: the Grid’s typical user comes from the HPC world, and the typical workload can be described with well-known models of parallel production environments workloads  No parallel jobs  No power-of-2 sized jobs  No tightly-coupled parallel jobs The infinite power assumption: the Grid can deliver any computing power demand, and in particular will offer at least computing power comparable to traditional Parallel Production Environments

July 13, Grids vs. Parallel Production Envs. … and what have we observed The HPC successor assumption: the Grid’s typical user comes from the HPC world, and the typical workload can be described with well-known models of parallel production environments workloads  No parallel jobs  No power-of-2 sized jobs  No tightly-coupled parallel jobs The infinite power assumption: the Grid can deliver any computing power demand, and in particular offers at least computing power comparable to traditional Parallel Production Environments

July 13, Grid Computing Works! … and what have we observed The Power of Sums is Greater Than the Sum of Powers assumption: by coupling individual environments the Grid will provide better performance than by using the individual environments alone  Higher throughput  Shorter wait times  Shorter response times  Same or lower jobs slowdown There is a need for the Power of Sums assumption: institutions have the will and the need to couple their environments into a larger, grid-based, environment

July 13, Grid Computing Works! … and what have we observed The Power of Sums is Greater Than the Sum of Powers assumption: by coupling individual environments the Grid will provide better performance than by using the individual environments alone  Higher throughput (as predicted)  Shorter wait times (NOT as predicted)  Shorter response times  Same or lower jobs slowdown (as predicted) There is a need for the Power of Sums assumption: institutions have the will and the need to couple their environments into a larger, grid-based, environment

July 13, Grid Computing Works! … and what have we observed The Power of Sums is Greater Than the Sum of Powers assumption: by coupling individual environments the Grid will provide better performance than by using the individual environments alone  Higher throughput (as predicted)  Shorter wait times (NOT as predicted)  Shorter response times  Same or lower jobs slowdown (as predicted) There is a need for the Power of Sums assumption: (there already is a large community using them!)

July 13, Final Words

July 13, How does this affect your work? We identify or quantitatively assess several challenges for Grid scheduling, monitoring, and benchmarking (in the paper) More realistic simulation assumptions, based on these findings Currently building a Grid Workloads Archive for the benefit of the whole community. We need your help!

July 13, NO tightly-coupled parallel jobs, - top groups/users dominate the workload, etc. Immediate impact - Community goals: try to improve on learning curve, waiting time, … - Simulation work: try to use more realistic setups Take home message “How are real grids used?” Not like we used to believe! Grid Workloads Archive We need your help! Community effort

July 13, Thank you! Questions? Remarks? Observations? Help building our community’s Grid Workloads Archive Contact: ] Alexandru Iosup Dick H.J. Epema ]