SALSASALSASALSASALSA Performance Analysis of High Performance Parallel Applications on Virtualized Resources Jaliya Ekanayake and Geoffrey Fox Indiana.

Slides:

Advertisements

Similar presentations

Virtual Switching Without a Hypervisor for a More Secure Cloud Xin Jin Princeton University Joint work with Eric Keller(UPenn) and Jennifer Rexford(Princeton)

Advertisements

KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

The Who, What, Why and How of High Performance Computing Applications in the Cloud Soheila Abrishami 1.

Performance Analysis of Virtualization for High Performance Computing A Practical Evaluation of Hypervisor Overheads Matthew Cawood University of Cape.

Virtualization in HPC Minesh Joshi CSC 469 Dr. Box Feb 1, 2012.

Cloud activities at Indiana University: Case studies in service hosting, storage, and computing Marlon Pierce, Joe Rinkovsky, Geoffrey Fox, Jaliya Ekanayake,

XENMON: QOS MONITORING AND PERFORMANCE PROFILING TOOL Diwaker Gupta, Rob Gardner, Ludmila Cherkasova 1.

AASPI Software Computational Environment Tim Kwiatkowski Welcome Consortium Members November 18, 2008.

Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.

Analysis of Virtualization Technologies for High Performance Computing Environments Andrew J. Younge, Robert Henschel, James T. Brown, Gregor von Laszewski,

SALSASALSASALSASALSA Using MapReduce Technologies in Bioinformatics and Medical Informatics Computing for Systems and Computational Biology Workshop SC09.

SALSASALSASALSASALSA Chemistry in the Digital Age Workshop, Penn State University, June 11, 2009 Geoffrey Fox

Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh.

1 Clouds and Sensor Grids CTS2009 Conference May Alex Ho Anabas Inc. Geoffrey Fox Computer Science, Informatics, Physics Chair Informatics Department.

Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.

Virtualization for Cloud Computing

High Performance Computing with cloud Xu Tong. About the topic Why HPC(high performance computing) used on cloud What’s the difference between cloud and.

SALSASALSA Programming Abstractions for Multicore Clouds eScience 2008 Conference Workshop on Abstractions for Distributed Applications and Systems December.

SALSASALSASALSASALSA High Performance Biomedical Applications Using Cloud Technologies HPC and Grid Computing in the Cloud Workshop (OGF27 ) October 13,

Tanenbaum 8.3 See references

Utility Computing Casey Rathbone 1http://cyberaide.org.edu.

Eucalyptus on FutureGrid: A case for Eucalyptus 3 Sharif Islam, Javier Diaz, Geoffrey Fox Gregor von Laszewski Indiana University.

Q. Huang, J. Xia, M. Sun, K. Liu, J. Li, Z. Gui, C. Xu, C. Yang, Chapter 14 How to test the readiness of open-source cloud computing solutions, In.

13-1 Veeam Monitor Demo Topic 2: VM Performance Monitoring 3.

SALSASALSASALSASALSA AOGS, Singapore, August 11-14, 2009 Geoffrey Fox 1,2 and Marlon Pierce 1

Science in Clouds SALSA Team salsaweb/salsa Community Grids Laboratory, Digital Science Center Pervasive Technology Institute Indiana University.

SAIGONTECH COPPERATIVE EDUCATION NETWORKING Spring 2010 Seminar #1 VIRTUALIZATION EVERYWHERE.

SAIGONTECH COPPERATIVE EDUCATION NETWORKING Spring 2009 Seminar #1 VIRTUALIZATION EVERYWHERE.

Distributed FutureGrid Clouds for Scalable Collaborative Sensor-Centric Grid Applications For AMSA TO 4 Sensor Grid Technical Interchange Meeting By Anabas,

Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D FIS Distinguished Professor of Computer Science School of.

SALSASALSA Twister: A Runtime for Iterative MapReduce Jaliya Ekanayake Community Grids Laboratory, Digital Science Center Pervasive Technology Institute.

Secure & flexible monitoring of virtual machine University of Mazandran Science & Tecnology By : Esmaill Khanlarpour January.

การติดตั้งและทดสอบการทำคลัสเต อร์เสมือนบน Xen, ROCKS, และไท ยกริด Roll Implementation of Virtualization Clusters based on Xen, ROCKS, and ThaiGrid Roll.

Improving Network I/O Virtualization for Cloud Computing.

1 Performance of a Multi-Paradigm Messaging Runtime on Multicore Systems Poster at Grid 2007 Omni Austin Downtown Hotel Austin Texas September

Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal

Cansys West International Conference February , 2013Panama City, Panama An easier way to deliver APPX applications.

CPS Welcome to a new licensing model in SPLA.

AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author ： Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source ： Proceedings of the 2nd IASTED.

SALSASALSASALSASALSA MSR Internship – Final Presentation Jaliya Ekanayake School of Informatics and Computing Indiana University.

High Performance Computing on Virtualized Environments Ganesh Thiagarajan Fall 2014 Instructor: Yuzhe(Richard) Tang Syracuse University.

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

FutureGrid Connection to Comet Testbed and On Ramp as a Service Geoffrey Fox Indiana University Infra structure.

Image Generation and Management on FutureGrid CTS Conference 2011 Philadelphia May Geoffrey Fox

IM&T Vacation Program Benjamin Meyer Virtualisation and Hyper-Threading in Scientific Computing.

SALSASALSASALSASALSA CloudComp 09 Munich, Germany Jaliya Ekanayake, Geoffrey Fox School of Informatics and Computing Pervasive.

Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters

SALSASALSASALSASALSA FutureGrid Venus-C June Geoffrey Fox

1 Performance Measurements of CCR and MPI on Multicore Systems Expanded from a Poster at Grid 2007 Austin Texas September Xiaohong Qiu Research.

Performance Model for Parallel Matrix Multiplication with Dryad: Dataflow Graph Runtime Hui Li School of Informatics and Computing Indiana University 11/1/2012.

2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.

SALSASALSASALSASALSA Clouds Ball Aerospace March Geoffrey Fox

A. Frank - P. Weisberg Operating Systems Structure of Operating Systems.

By Chi-Chang Chen.  Cluster computing is a technique of linking two or more computers into a network (usually through a local area network) in order.

Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing,

Grid Appliance The World of Virtual Resource Sharing Group # 14 Dhairya Gala Priyank Shah.

COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.

SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu

Cloud Computing Lecture 5-6 Muhammad Ahmad Jan.

SALSASALSASALSASALSA Data Intensive Biomedical Computing Systems Statewide IT Conference October 1, 2009, Indianapolis Judy Qiu

Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->

Group # 14 Dhairya Gala Priyank Shah. Introduction to Grid Appliance The Grid appliance is a plug-and-play virtual machine appliance intended for Grid.

KAASHIV INFOTECH – A SOFTWARE CUM RESEARCH COMPANY IN ELECTRONICS, ELECTRICAL, CIVIL AND MECHANICAL AREAS

SPIDAL Java High Performance Data Analytics with Java on Large Multicore HPC Clusters

High Performance Computing (HPC)

Virtualization for Cloud Computing

Science Clouds and Campus Clouds

Xen Summit Spring 2007 Platform Virtualization with XenEnterprise

GCC2008 (Global Clouds and Cores 2008) October Geoffrey Fox

Clouds and Grids Multicore and all that

Presentation transcript:

SALSASALSASALSASALSA Performance Analysis of High Performance Parallel Applications on Virtualized Resources Jaliya Ekanayake and Geoffrey Fox Indiana University 501 N Morton Suite 224 Bloomington IN {Jekanaya,

SALSASALSA Private Cloud Infrastructure Eucalyptus and Xen based private cloud infrastructure – Eucalyptus version 1.4 and Xen version – Deployed on 16 nodes each with 2 Quad Core Intel Xeon processors and 32 GB of memory – All nodes are connected via a 1 giga-bit connections Bare-metal and VMs use exactly the same software environments – Red Hat Enterprise Linux Server release 5.2 (Tikanga) operating system. OpenMPI version with gcc version

SALSASALSA MPI Applications

SALSASALSA Different Hardware/VM configurations Invariant used in selecting the number of MPI processes RefDescription Number of CPU cores accessible to the virtual or bare-metal node Amount of memory (GB) accessible to the virtual or bare-metal node Number of virtual or bare- metal nodes deployed BMBare-metal node VM-8- core 1 VM instance per bare-metal node 8 30 (2GB is reserved for Dom0) 16 2-VM-4- core 2 VM instances per bare-metal node VM-2- core 4 VM instances per bare-metal node VM-1- core 8 VM instances per bare-metal node Number of MPI processes = Number of CPU cores used

SALSASALSA Matrix Multiplication Implements Cannon’s Algorithm [1] Exchange large messages More susceptible to bandwidth than latency At 81 MPI processes, at least 14% reduction in speedup is noticeable Performance - 64 CPU coresSpeedup – Fixed matrix size (5184x5184) [1] S. Johnsson, T. Harris, and K. Mathur, “Matrix multiplication on the connection machine,” In Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Reno, Nevada, United States, November , 1989). Supercomputing '89. ACM, New York, NY, DOI=

SALSASALSA Kmeans Clustering Perform Kmeans clustering for up to 40 million 3D data points Amount of communication depends only on the number of cluster centers Amount of communication << Computation and the amount of data processed At the highest granularity VMs show at least 3.5 times overhead compared to bare-metal Extremely large overheads for smaller grain sizes Performance – 128 CPU coresOverhead

SALSASALSA Concurrent Wave Equation Solver Clear difference in performance and speedups between VMs and bare-metal Very small messages (the message size in each MPI_Sendrecv() call is only 8 bytes) More susceptible to latency At data points, at least 40% decrease in performance is observed in VMs Performance - 64 CPU cores Total Speedup – data points

SALSASALSA Higher latencies -1 domUs (VMs that run on top of Xen para-virtualization) are not capable of performing I/O operations dom0 (privileged OS) schedules and executes I/O operations on behalf of domUs More VMs per node => more scheduling => higher latencies Xen configuration for 1-VM per node 8 MPI processes inside the VM Xen configuration for 8-VMs per node 1 MPI process inside each VM

SALSASALSA Lack of support for in-node communication => “Sequentilizing” parallel communication Better support for in-node communication in OpenMPI outperforms LAM-MPI for 1-VM per node configuration In 8-VMs per node, 1 MPI process per VM configuration, both OpenMPI and LAM-MPI perform equally well Higher latencies -2 Kmeans Clustering

SALSASALSA Conclusions and Future Works It is plausible to use virtualized resources for HPC applications MPI applications experience moderate to high overheads when performed on virtualized resources Applications sensitive to latencies experience higher overheads Bandwidth does not seem to be an issue More VMs per node => Higher overheads In-node communication support is crucial when multiple parallel processes are run on a single VM Applications such as MapReduce may perform well on VMs ? – (milliseconds to seconds latencies they already have in communication may absorb the latencies of VMs without much effect)