August 28, 2015 1 Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing Berkeley, CA, USA Alexandru Iosup, Nezih Yigitbasi,

Slides:

Advertisements

Similar presentations

1 Cloud Futures Workshop 2010 – Cloud Computing Support for Massively Social Gaming Alexandru Iosup Parallel and Distributed Systems Group Delft University.

Advertisements

Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel.

Computer Organization TI1400 Alexandru Iosup (lecturer) Henk Sips (original slides) Parallel and Distributed Systems

19 November 2013 Exploring Portfolio Scheduling for Long-term Execution of Scientific Workloads in IaaS Clouds Alexandru Iosup Delft University of Technology.

1 Distributed Systems Meet Economics: Pricing in Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of.

June 3, 2015 Synthetic Grid Workloads with Ibis, K OALA, and GrenchMark CoreGRID Integration Workshop, Pisa A. Iosup, D.H.J. Epema Jason Maassen, Rob van.

Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.

The Performance of Bags-Of-Tasks in Large-Scale Distributed Computing Systems Alexandru Iosup, Ozan Sonmez, Shanny Anoep, and Dick Epema ACM/IEEE Int’l.

Inter-Operating Grids through Delegated MatchMaking Alexandru Iosup, Dick Epema, Hashim Mohamed,Mathieu Jan, Ozan Sonmez 3 rd Grid Initiative Summer School,

1 NetGames 2010 – CAMEO: Continuous Analytics for Massively Multiplayer Online Games CAMEO : Enabling Social Networks for Massively Multiplayer Online.

1 Google Workshop at TU Delft, 2010 – Online Games and Clouds Cloudifying Games: Rain for the Thirsty Alexandru Iosup Parallel and Distributed Systems.

1 A Performance Study of Grid Workflow Engines Alexandru Iosup and Dick Epema PDS Group Delft University of Technology The Netherlands Corina Stratan Parallel.

1 Trace-Based Characteristics of Grid Workflows Alexandru Iosup and Dick Epema PDS Group Delft University of Technology The Netherlands Simon Ostermann,

June 25, GrenchMark: Synthetic workloads for Grids First Demo at TU Delft A. Iosup, D.H.J. Epema PDS Group, ST/EWI, TU Delft.

June 28, Resource and Test Management in Grids Rapid Prototyping in e-Science VL-e Workshop, Amsterdam, NL Dick Epema, Catalin Dumitrescu, Hashim.

July 13, “How are Real Grids Used?” The Analysis of Four Grid Traces and Its Implications IEEE Grid 2006 Alexandru Iosup, Catalin Dumitrescu, and.

Euro-Par 2008, Las Palmas, 27 August DGSim : Comparing Grid Resource Management Architectures Through Trace-Based Simulation Alexandru Iosup, Ozan.

M.A.Doman Model for enabling the delivery of computing as a SERVICE.

1 Efficient Management of Data Center Resources for Massively Multiplayer Online Games V. Nae, A. Iosup, S. Podlipnig, R. Prodan, D. Epema, T. Fahringer,

MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering.

IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.

Software Engineering for Cloud Computing Rao, Feng 04/27/2011.

FOSS4G: 52°North WPS Behind the buzz of Cloud Computing - 52°North Open Source Geoprocessing Software in the Clouds FOSS4G 2009.

10 -1  The Term Project demands in-depth research and investigated reporting. All reported contents, figures, and tables must be originally generated.

August 29, Our team: Undergrad Nassos Antoniou, Thomas de Ruiter, Ruben Verboon, … Grad Siqi Shen, Nezih Yigitbasi, Ozan Sonmez Staff Henk Sips,

Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.

: Massivizing Online Games using Cloud Computing Alexandru Iosup Parallel and Distributed Systems Group, Delft University of Technology, The Netherlands.

1 EIT ICT Labs Workshop at TU Delft, May 2011 – Cloud Computing Parallel and Distributed Systems Group Delft University of Technology The Netherlands Our.

Customized cloud platform for computing on your terms !

1 TUD-PDS A Periodic Portfolio Scheduler for Scientific Computing in the Data Center Kefeng Deng, Ruben Verboon, Kaijun Ren, and Alexandru Iosup Parallel.

1 Cloud Computing Research at TU Delft – A. Iosup Alexandru Iosup Parallel and Distributed Systems Group Delft University of Technology The Netherlands.

A Performance Evaluation of Azure and Nimbus Clouds for Scientific Applications Radu Tudoran KerData Team Inria Rennes ENS Cachan 10 April 2012 Joint work.

1 EuroPar 2009 – POGGI: Puzzle-Based Online Games on Grid Infrastructures POGGI: Puzzle-Based Online Games on Grid Infrastructures Alexandru Iosup Parallel.

The Cloud Workloads Archive: A Status Report

Marcos Dias de Assunção 1,2, Alexandre di Costanzo 1 and Rajkumar Buyya 1 1 Department of Computer Science and Software Engineering 2 National ICT Australia.

IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.

การติดตั้งและทดสอบการทำคลัสเต อร์เสมือนบน Xen, ROCKS, และไท ยกริด Roll Implementation of Virtualization Clusters based on Xen, ROCKS, and ThaiGrid Roll.

M.A.Doman Short video intro Model for enabling the delivery of computing as a SERVICE.

AUTHORS: STIJN POLFLIET ET. AL. BY: ALI NIKRAVESH Studying Hardware and Software Trade-Offs for a Real-Life Web 2.0 Workload.

Extreme scale parallel and distributed systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward.

An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.

1 Time & Cost Sensitive Data-Intensive Computing on Hybrid Clouds Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The.

Energy Prediction for I/O Intensive Workflow Applications 1 MASc Exam Hao Yang NetSysLab The Electrical and Computer Engineering Department The University.

1 A Framework for Data-Intensive Computing with Cloud Bursting Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The Ohio.

Presented by: Mostafa Magdi. Contents Introduction. Cloud Computing Definition. Cloud Computing Characteristics. Cloud Computing Key features. Cost Virtualization.

1 Challenge the future KOALA-C: A Task Allocator for Integrated Multicluster and Multicloud Environments Presenter: Lipu Fei Authors: Lipu Fei, Bogdan.

High Performance Computing on Virtualized Environments Ganesh Thiagarajan Fall 2014 Instructor: Yuzhe(Richard) Tang Syracuse University.

1 ROIA 2009 – CAMEO: Continuous Analytics for Massively Multiplayer Online Games CAMEO: Continuous Analytics for Massively Multiplayer Online Games Alexandru.

October 23, Grid Computing: From Old Traces to New Applications Fribourg, Switzerland Alexandru Iosup, Ozan Sonmez, Nezih Yigitbasi, Hashim Mohamed,

October 23, Our team: Undergrad Nassos Antoniou, Thomas de Ruiter, Ruben Verboon, … Grad Siqi Shen, Nezih Yigitbasi, Ozan Sonmez Staff Henk Sips,

October 27, Our team: Undergrad Nassos Antoniou, Thomas de Ruiter, Ruben Verboon, … Grad Siqi Shen, Nezih Yigitbasi, Ozan Sonmez Staff Henk Sips,

Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.

November 29, Our team: Undergrad Thomas de Ruiter, Anand Sawant, Ruben Verboon, … Grad Siqi Shen, Guo Yong, Nezih Yigitbasi Staff Henk Sips, Dick.

Distributed Process Discovery From Large Event Logs Sergio Hernández de Mesa {

IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.

Alexandru Iosup Parallel and Distributed Systems Group Delft University of Technology The Netherlands Cloud Computing : Open Research Questions.

Cloud Benchmarking, Tools, and Challenges

A Cloudy Future Panel at CCGSC ‘08

Analysis of File Systems Performance in Amazon EC2 Storage

SuperComputing 2003 “The Great Academia / Industry Grid Debate” ?

Department of CSE CLOUD COMPUTING UNIT-V.

Improving Datacenter Performance and Robustness with Multipath TCP

(Parallel and) Distributed Systems Group

Vlad Nae, Radu Prodan, Thomas Fahringer Institute of Computer Science

Cloud Performance Evaluation at TU Delft (2008—)

On Dynamic Resource Availability in Grids

IaaS Cloud Benchmarking: Approaches, Challenges, and Experience

The Performance of Big Data Workloads in Cloud Datacenters

A workload-aware energy model for VM migration

Presentation transcript:

August 28, Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing Berkeley, CA, USA Alexandru Iosup, Nezih Yigitbasi, Dick Epema Parallel and Distributed Systems Group, Delft University of Technology, The Netherlands Simon Ostermann, Radu Prodan, Thomas Fahringer Distributed and Parallel Systems, University of Innsbruck, Austria

About the Team Team’s Recent Work in Performance The Grid Workloads Archive (Nov 2006) The Failure Trace Archive (Nov 2009) The Peer-to-Peer Trace Archive (Apr 2010) Tools: GrenchMark workload-based grid benchmarking, other Monitoring and Perf. Eval. tools Speaker: Alexandru Iosup Systems work: Tribler (P2P file sharing), Koala (grid scheduling), POGGI and CAMEO (massively multiplayer online gaming) Grid and Peer-to-Peer workload characterization and modeling August 28,

Many-Tasks Scientific Computing Jobs comprising Many Tasks (1,000s) necessary to achieve some meaningful scientific goal Jobs submitted as bags-of-tasks or over short periods of time High-volume users over long periods of time Common in grid workloads [Ios06][Ios08] No practical definition (from “many” to “10,000/h”) August 28,

Cloud Futures Workshop 2010 – Cloud Computing Support for Massively Social Gaming 4 The Real Cloud “The path to abundance” On-demand capacity Cheap for short-term tasks Great for web apps (EIP, web crawl, DB ops, I/O) “The killer cyclone” Not so great performance for scientific applications 1 (compute- or data-intensive) Long-term perf. variability 2 Tropical Cyclone Nargis (NASA, ISSS, 04/29/08) 1- Iosup et al., Performance Analysis of Cloud Computing Services for Many Tasks Scientific Computing, (under submission). 2- Iosup et al., On the Performance Variability of Production Cloud Services, Technical Report PDS , [Online] Available: VS

Research Question and Previous Work Do clouds and Many-Tasks Scientific Computing fit well, performance-wise? Virtualization Overhead Loss below 5% for computation [Barham03] [Clark04] Loss below 15% for networking [Barham03] [Menon05] Loss below 30% for parallel I/O [Vetter08] Negligible for compute-intensive HPC kernels [You06] [Panda06] Cloud Performance Evaluation Performance and cost of executing a sci. workflows [Dee08] Study of Amazon S3 [Palankar08] Amazon EC2 for the NPB benchmark suite [Walker08] or selected HPC benchmarks [Hill08] Theory: just use virtualization overhead results. Practice? August 28,

6 Agenda 1.Introduction & Motivation 2.Proto-Many Task Users 3.Performance Evaluation of Four Clouds 4.Clouds vs Other Environments 5.Take Home Message

Proto-Many Task Users MTC user At least J jobs in B bags-of-tasks Trace-based analysis 6 grid traces, 4 parallel production environment traces Various criteria (combinations of values for J and B) Results “number of BoTs submitted 1,000 & number of tasks submitted 10,000” Easy to grasp + Dominate most traces (jobs and CPUTime) + 1-CPU jobs August 28,

8 Agenda 1.Introduction & Motivation 2.Proto-Many Task Users 3.Performance Evaluation of Four Clouds 1.Experimental Setup 2.Selected Results 4.Clouds vs Other Environments 5.Take Home Message

Experimental Setup Environments Four commercial IaaS clouds (NIST definitions) Amazon EC2 GoGrid Elastic Hosts Mosso No Cluster instances (not released in Dec’08-Jan’09) August 28,

Experimental Setup Experiment Design Principles Use complete test suites Repeat 10 times Use defaults, not tuning Use common benchmarks  Compare results with results for other systems Types of experiments Resource acquisition and release Single-Instance (SI) benchmarking Multiple-Instance (MI) benchmarking August 28,

Resource Acquisition: Can Matter Can be significant For single instances (GoGrid) For multiple instances (all) Short-term variability can be high (GoGrid) Slow long-term growth August 28,

Single Instances: Compute Performance Lower Than Expected ECU = 4.4 GFLOPS (at 100% efficient code) = 1.1GHz 2007 Opteron x 4 FLOPS/cycle (full pipeline) In our tests: GFLOPS Sharing of the same physical machines (working set) Lack of code optimizations beyond –O3 –funroll-loops Metering requires more clarification Instances with excellent float/double addition perf. may have poor multiplication perf. (c1.medium, c1.xlarge) August 28,

Multi-Instance: Low Efficiency in HPL Peak Performance 2 x c1.xlarge ( GFLOPS, HPCC-227 (Cisco, 102, HPCC-286 (Intel, x c1.xlarge (128 1,408 GFLOPS, HPCC-224 (Cisco, 819, HPCC-289 (Intel, 1,433 Efficiency Cloud: 15-50% even for small (<128) instance counts HPC: 60-70% August 28,

Cloud Futures Workshop 2010 – Cloud Computing Support for Massively Social Gaming 14 Cloud Performance Variability Performance variability of production cloud services Infrastructure: Amazon Web Services Platform: Google App Engine Year-long performance information for nine services Finding: about half of the cloud services investigated in this work exhibits yearly and daily patterns; impact of performance variability depends on application. A. Iosup, N. Yigitbasi, and D. Epema, On the Performance Variability of Production Cloud Services, (under submission). Amazon S3: GET US HI operations

August 28, Agenda 1.Introduction & Motivation 2.Proto-Many Task Users 3.Performance Evaluation of Four Clouds 4.Clouds vs Other Environments 5.Take Home Message

Clouds vs Other Environments Trace-based simulation, DGSim (grid) simulator Compute-intensive, no data IO Source Env v Cloud w/ source-like performance v Cloud w/ real (measured) performance Slowdown for Sequential: 7 times, Parallel: 1-10 times Results Response time 4-10 times higher in real clouds Good for short-term, deadline-driven projects August 28,

August 28, Take Home Message Many-Tasks Scientific Computing Quantitative definition: J jobs and B bags-of-tasks Extracted proto-MT users from grid and parallel prod. envs. Performance Evaluation of Four Commercial Clouds Amazon EC2, GoGrid, Elastic Hosts, Mosso Resource acquisition, Single- and Multi-Instance benchmarking Low compute and networking performance Clouds vs Other Environments An order of magnitude better performance needed for clouds Clouds already good for short-term, deadline-driven sci. comp.

August 28, Potential for Collaboration Other performance evaluation studies of clouds The new Amazon EC2 instance—Cluster Compute Other clouds? Data-intensive benchmarks General logs Failure Trace Archive Grid Workloads Archive …

August 28, Thank you! Questions? Observations? More Information: The Grid Workloads Archive: gwa.ewi.tudelft.nlgwa.ewi.tudelft.nl The Failure Trace Archive: fta.inria.frfta.inria.fr The GrenchMark perf. eval. tool: grenchmark.st.ewi.tudelft.nlgrenchmark.st.ewi.tudelft.nl Cloud research: see PDS publication database at: Big thanks to our collaborators: U. Wisc.-Madison, U Chicago, U Dortmund, U Innsbruck, LRI/INRIA Paris, INRIA Grenoble, U Leiden, Politehnica University of Bucharest, Technion, …