SALSASALSASALSASALSA CloudComp 09 Munich, Germany Jaliya Ekanayake, Geoffrey Fox School of Informatics and Computing Pervasive.

Slides:

Advertisements

Similar presentations

SLA-Oriented Resource Provisioning for Cloud Computing

Advertisements

Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng.

SALSA HPC Group School of Informatics and Computing Indiana University.

Introduction to Programming Paradigms Activity at Data Intensive Workshop Shantenu Jha represented by Geoffrey Fox

Twister4Azure Iterative MapReduce for Windows Azure Cloud Thilina Gunarathne Indiana University Iterative MapReduce for Azure Cloud.

SCALABLE PARALLEL COMPUTING ON CLOUDS : EFFICIENT AND SCALABLE ARCHITECTURES TO PERFORM PLEASINGLY PARALLEL, MAPREDUCE AND ITERATIVE DATA INTENSIVE COMPUTATIONS.

SALSASALSASALSASALSA Using MapReduce Technologies in Bioinformatics and Medical Informatics Computing for Systems and Computational Biology Workshop SC09.

SALSASALSASALSASALSA Chemistry in the Digital Age Workshop, Penn State University, June 11, 2009 Geoffrey Fox

SALSASALSASALSASALSA Using Cloud Technologies for Bioinformatics Applications MTAGS Workshop SC09 Portland Oregon November Judy Qiu

Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.

1 Clouds and Sensor Grids CTS2009 Conference May Alex Ho Anabas Inc. Geoffrey Fox Computer Science, Informatics, Physics Chair Informatics Department.

Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.

Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.

Virtualization for Cloud Computing

MapReduce in the Clouds for Science CloudCom 2010 Nov 30 – Dec 3, 2010 Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox {tgunarat, taklwu,

Dimension Reduction and Visualization of Large High-Dimensional Data via Interpolation Seung-Hee Bae, Jong Youl Choi, Judy Qiu, and Geoffrey Fox School.

MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering.

SALSASALSA Programming Abstractions for Multicore Clouds eScience 2008 Conference Workshop on Abstractions for Distributed Applications and Systems December.

SALSASALSASALSASALSA Performance Analysis of High Performance Parallel Applications on Virtualized Resources Jaliya Ekanayake and Geoffrey Fox Indiana.

SALSASALSASALSASALSA High Performance Biomedical Applications Using Cloud Technologies HPC and Grid Computing in the Cloud Workshop (OGF27 ) October 13,

Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,

Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.

School of Informatics and Computing Indiana University

SALSASALSASALSASALSA AOGS, Singapore, August 11-14, 2009 Geoffrey Fox 1,2 and Marlon Pierce 1

Science in Clouds SALSA Team salsaweb/salsa Community Grids Laboratory, Digital Science Center Pervasive Technology Institute Indiana University.

SALSASALSA Twister: A Runtime for Iterative MapReduce Jaliya Ekanayake Community Grids Laboratory, Digital Science Center Pervasive Technology Institute.

On the Varieties of Clouds for Data Intensive Computing 董耀文 Antslab Robert L. Grossman University of Illinois at Chicago And Open Data.

Portable Parallel Programming on Cloud and HPC: Scientific Applications of Twister4Azure Thilina Gunarathne Bingjing Zhang, Tak-Lon.

FutureGrid Dynamic Provisioning Experiments including Hadoop Fugang Wang, Archit Kulshrestha, Gregory G. Pike, Gregor von Laszewski, Geoffrey C. Fox.

Large Scale Sky Computing Applications with Nimbus Pierre Riteau Université de Rennes 1, IRISA INRIA Rennes – Bretagne Atlantique Rennes, France

SALSASALSASALSASALSA MSR Internship – Final Presentation Jaliya Ekanayake School of Informatics and Computing Indiana University.

SALSASALSASALSASALSA Cloud Technologies for Data Intensive Computing Cloud Computing and Collaborative Technologies in the Geosciences September 17-18,

SALSASALSASALSASALSA Design Pattern for Scientific Applications in DryadLINQ CTP DataCloud-SC11 Hui Li Yang Ruan, Yuduo Zhou Judy Qiu, Geoffrey Fox.

SALSASALSASALSASALSA New Approaches to Scientific Computing Presentation to visitors from Lilly September 25, 2009, Bloomington Geoffrey Fox

Parallel Applications And Tools For Cloud Computing Environments Azure MapReduce Large-scale PageRank with Twister Twister BLAST Thilina Gunarathne, Stephen.

SALSA HPC Group School of Informatics and Computing Indiana University.

SALSASALSASALSASALSA FutureGrid Venus-C June Geoffrey Fox

Community Grids Lab. Indiana University, Bloomington Seung-Hee Bae.

MC 2 : Map Concurrency Characterization for MapReduce on the Cloud Mohammad Hammoud and Majd Sakr 1.

Performance Model for Parallel Matrix Multiplication with Dryad: Dataflow Graph Runtime Hui Li School of Informatics and Computing Indiana University 11/1/2012.

SALSASALSASALSASALSA Clouds Ball Aerospace March Geoffrey Fox

X-Informatics MapReduce February Geoffrey Fox Associate Dean for Research.

U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.

Security: systems, clouds, models, and privacy challenges iDASH Symposium San Diego CA October Geoffrey.

Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics,

SALSA Group Research Activities April 27, Research Overview  MapReduce Runtime  Twister  Azure MapReduce  Dryad and Parallel Applications 

MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.

SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu

Parallel Applications And Tools For Cloud Computing Environments CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.

HPC in the Cloud – Clearing the Mist or Lost in the Fog Panel at SC11 Seattle November Geoffrey Fox

Memcached Integration with Twister Saliya Ekanayake - Jerome Mitchell - Yiming Sun -

SALSASALSASALSASALSA Data Intensive Biomedical Computing Systems Statewide IT Conference October 1, 2009, Indianapolis Judy Qiu

SALSASALSASALSASALSA Cloud Technologies for Data Intensive Biomedical Computing OGF27 Workshop October 13, 2009, Banff Judy Qiu

SALSASALSA Dynamic Virtual Cluster provisioning via XCAT on iDataPlex Supports both stateful and stateless OS images iDataplex Bare-metal Nodes Linux Bare-

SALSASALSA Harp: Collective Communication on Hadoop Judy Qiu, Indiana University.

SALSASALSASALSASALSA IU Twister Supports Data Intensive Science Applications School of Informatics and Computing Indiana University.

KAASHIV INFOTECH – A SOFTWARE CUM RESEARCH COMPANY IN ELECTRONICS, ELECTRICAL, CIVIL AND MECHANICAL AREAS

SALSASALSASALSASALSA Data Intensive Biomedical Computing Systems Statewide IT Conference October 1, 2009, Indianapolis Judy Qiu

SALSA HPC Group School of Informatics and Computing Indiana University Workshop on Petascale Data Analytics: Challenges, and.

Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )

Early Experience with Cloud Technologies

Our Objectives Explore the applicability of Microsoft technologies to real world scientific domains with a focus on data intensive applications Expect.

Applying Twister to Scientific Applications

MapReduce for Data Intensive Scientific Analyses

Biology MDS and Clustering Results

SC09 Doctoral Symposium, Portland, 11/18/2009

GCC2008 (Global Clouds and Cores 2008) October Geoffrey Fox

Scientific Data Analytics on Cloud and HPC Platforms

Clouds from FutureGrid’s Perspective

Efficient Migration of Large-memory VMs Using Private Virtual Memory

Presentation transcript:

SALSASALSASALSASALSA CloudComp 09 Munich, Germany Jaliya Ekanayake, Geoffrey Fox School of Informatics and Computing Pervasive Technology Institute Indiana University Bloomington 11,2 2 1

SALSASALSA Acknowledgements to: Joe Rinkovsky and Jenett Tillotson at IU UITS SALSA Team - Pervasive Technology Institution, Indiana University – Scott Beason – Xiaohong Qiu – Thilina Gunarathne

SALSASALSA Computing in Clouds On demand allocation of resources (pay per use) Customizable Virtual Machine (VM)s – Any software configuration Root/administrative privileges Provisioning happens in minutes – Compared to hours in traditional job queues Better resource utilization – No need to allocated a whole 24 core machine to perform a single threaded R analysis Commercial Clouds Amazon EC2 GoGrid 3Tera Private Clouds Eucalyptus (Open source) Nimbus Xen Some Benefits: Accessibility to a computation power is no longer a barrier.

SALSASALSA Cloud Technologies/Parallel Runtimes Cloud technologies – E.g. Apache Hadoop (MapReduce) Microsoft DryadLINQ MapReduce++ (earlier known as CGL-MapReduce) – Moving computation to data – Distributed file systems (HDFS, GFS) – Better quality of service (QoS) support – Simple communication topologies Most HPC applications use MPI – Variety of communication topologies – Typically use fast (or dedicated) network settings

SALSASALSA Applications & Different Interconnection Patterns Map Only (Embarrassingly Parallel) Classic MapReduce Iterative Reductions MapReduce++ Loosely Synchronous CAP3 Analysis Document conversion (PDF -> HTML) Brute force searches in cryptography Parametric sweeps High Energy Physics (HEP) Histograms SWG gene alignment Distributed search Distributed sorting Information retrieval Expectation maximization algorithms Clustering Linear Algebra Many MPI scientific applications utilizing wide variety of communication constructs including local interactions - CAP3 Gene Assembly - PolarGrid Matlab data analysis - Information Retrieval - HEP Data Analysis - Calculation of Pairwise Distances for ALU Sequences - K-means - Deterministic Annealing Clustering - Multidimensional Scaling MDS - Solving Differential Equations and - particle dynamics with short range forces Input Output map Input map reduce Input map reduce iterations Pij Domain of MapReduce and Iterative ExtensionsMPI

SALSASALSA MapReduce++ (earlier known as CGL-MapReduce) In memory MapReduce Streaming based communication – Avoids file based communication mechanisms Cacheable map/reduce tasks – Static data remains in memory Combine phase to combine reductions Extends the MapReduce programming model to iterative MapReduce applications

SALSASALSA What I will present next 1.Our experience in applying cloud technologies to: – EST (Expressed Sequence Tag) sequence assembly program -CAP3. – HEP Processing large columns of physics data using ROOT – K-means Clustering – Matrix Multiplication 2.Performance analysis of MPI applications using a private cloud environment

SALSASALSA Cluster Configurations FeatureWindows IU CPUIntel Xeon CPU L GHz Intel Xeon CPU L GHz # CPU /# Cores2 / 8 Memory16 GB32GB # Disks21 NetworkGiga bit Ethernet Operating SystemWindows Server 2008 Enterprise - 64 bit Red Hat Enterprise Linux Server -64 bit # Nodes Used32 Total CPU Cores Used256 DryadLINQ Hadoop / MPI/ Eucalyptus

SALSASALSA Pleasingly Parallel Applications High Energy Physics High Energy Physics CAP3 Performance of CAP3Performance of HEP

SALSASALSA Iterative Computations K-means Matrix Multiplication Performance of K-Means Parallel Overhead Matrix Multiplication

SALSASALSA Performance analysis of MPI applications using a private cloud environment Eucalyptus and Xen based private cloud infrastructure – Eucalyptus version 1.4 and Xen version – Deployed on 16 nodes each with 2 Quad Core Intel Xeon processors and 32 GB of memory – All nodes are connected via a 1 giga-bit connections Bare-metal and VMs use exactly the same software configurations – Red Hat Enterprise Linux Server release 5.2 (Tikanga) operating system. OpenMPI version with gcc version

SALSASALSA Different Hardware/VM configurations Invariant used in selecting the number of MPI processes RefDescription Number of CPU cores per virtual or bare-metal node Amount of memory (GB) per virtual or bare- metal node Number of virtual or bare- metal nodes BMBare-metal node VM-8-core (High-CPU Extra Large Instance) 1 VM instance per bare-metal node 8 30 (2GB is reserved for Dom0) 16 2-VM-4- core 2 VM instances per bare-metal node VM-2-core 4 VM instances per bare-metal node VM-1-core8 VM instances per bare-metal node Number of MPI processes = Number of CPU cores used

SALSASALSA MPI Applications FeatureMatrix multiplication K-means clusteringConcurrent Wave Equation Description Cannon’s Algorithm square process grid K-means Clustering Fixed number of iterations A vibrating string is (split) into points Each MPI process updates the amplitude over time Grain Size Computation Complexity O (n^3)O(n) Message Size Communication Complexity O(n^2)O(1) Communication /Computation n n n d n n C d n 1 1 1

SALSASALSA Matrix Multiplication Implements Cannon’s Algorithm [1] Exchange large messages More susceptible to bandwidth than latency At least 14% reduction in speedup between bare-metal and 1-VM per node Performance - 64 CPU coresSpeedup – Fixed matrix size (5184x5184) [1] S. Johnsson, T. Harris, and K. Mathur, “Matrix multiplication on the connection machine,” In Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Reno, Nevada, United States, November , 1989). Supercomputing '89. ACM, New York, NY, DOI=

SALSASALSA Kmeans Clustering Up to 40 million 3D data points Amount of communication depends only on the number of cluster centers Amount of communication << Computation and the amount of data processed At the highest granularity VMs show at least ~33% of total overhead Extremely large overheads for smaller grain sizes Performance – 128 CPU coresOverhead = (P * T(P) –T(1))/T(1)

SALSASALSA Concurrent Wave Equation Solver Clear difference in performance and overheads between VMs and bare-metal Very small messages (the message size in each MPI_Sendrecv() call is only 8 bytes) More susceptible to latency At data points, at least ~37% of total overhead in VMs Performance - 64 CPU cores Overhead = (P * T(P) –T(1))/T(1)

SALSASALSA Higher latencies -1 domUs (VMs that run on top of Xen para-virtualization) are not capable of performing I/O operations dom0 (privileged OS) schedules and execute I/O operations on behalf of domUs More VMs per node => more scheduling => higher latencies 1-VM per node 8 MPI processes inside the VM 8-VMs per node 1 MPI process inside each VM

SALSASALSA Lack of support for in-node communication => “Sequentializing” parallel communication Better support for in-node communication in OpenMPI – sm BTL (shared memory byte transfer layer) Both OpenMPI and LAM-MPI perform equally well in 8-VMs per node configuration Higher latencies -2 Kmeans Clustering

SALSASALSA Conclusions and Future Works Cloud technologies works for most pleasingly parallel applications Runtimes such as MapReduce++ extends MapReduce to iterative MapReduce domain MPI applications experience moderate to high performance degradation (10% ~ 40%) in private cloud – Dr. Edward walker noticed (40% ~ 1000%) performance degradations in commercial clouds [1] Applications sensitive to latencies experience higher overheads Bandwidth does not seem to be an issue in private clouds More VMs per node => Higher overheads In-node communication support is crucial Applications such as MapReduce may perform well on VMs ? [1] Walker, E.: benchmarking Amazon EC2 for high-performance scientific computing,

SALSASALSA Questions?

SALSASALSASALSASALSA Thank You!