SALSA HPC Group School of Informatics and Computing Indiana University.

Slides:

Advertisements

Similar presentations

Scalable High Performance Dimension Reduction

Advertisements

SALSA HPC Group School of Informatics and Computing Indiana University.

Twister4Azure Iterative MapReduce for Windows Azure Cloud Thilina Gunarathne Indiana University Iterative MapReduce for Azure Cloud.

SCALABLE PARALLEL COMPUTING ON CLOUDS : EFFICIENT AND SCALABLE ARCHITECTURES TO PERFORM PLEASINGLY PARALLEL, MAPREDUCE AND ITERATIVE DATA INTENSIVE COMPUTATIONS.

Hybrid MapReduce Workflow Yang Ruan, Zhenhua Guo, Yuduo Zhou, Judy Qiu, Geoffrey Fox Indiana University, US.

High Performance Dimension Reduction and Visualization for Large High-dimensional Data Analysis Jong Youl Choi, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox.

SALSASALSASALSASALSA Using MapReduce Technologies in Bioinformatics and Medical Informatics Computing for Systems and Computational Biology Workshop SC09.

SALSASALSASALSASALSA Using Cloud Technologies for Bioinformatics Applications MTAGS Workshop SC09 Portland Oregon November Judy Qiu

Interpolative Multidimensional Scaling Techniques for the Identification of Clusters in Very Large Sequence Sets April 27, 2011.

Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.

Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.

MapReduce in the Clouds for Science CloudCom 2010 Nov 30 – Dec 3, 2010 Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox {tgunarat, taklwu,

Scalable Parallel Computing on Clouds Thilina Gunarathne Advisor : Prof.Geoffrey Fox Committee : Prof.Judy Qiu,

Dimension Reduction and Visualization of Large High-Dimensional Data via Interpolation Seung-Hee Bae, Jong Youl Choi, Judy Qiu, and Geoffrey Fox School.

SALSASALSASALSASALSA Digital Science Center June 25, 2010, IIT Geoffrey Fox Judy Qiu School.

Iterative MapReduce Enabling HPC-Cloud Interoperability

SALSASALSA Programming Abstractions for Multicore Clouds eScience 2008 Conference Workshop on Abstractions for Distributed Applications and Systems December.

SALSASALSASALSASALSA High Performance Biomedical Applications Using Cloud Technologies HPC and Grid Computing in the Cloud Workshop (OGF27 ) October 13,

Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,

Panel Session The Challenges at the Interface of Life Sciences and Cyberinfrastructure and how should we tackle them? Chris Johnson, Geoffrey Fox, Shantenu.

Big Data Challenge Mega 10^6 Giga 10^9 Tera 10^12 Peta 10^15 Pig Latin.

Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.

School of Informatics and Computing Indiana University

Science in Clouds SALSA Team salsaweb/salsa Community Grids Laboratory, Digital Science Center Pervasive Technology Institute Indiana University.

SALSASALSA Twister: A Runtime for Iterative MapReduce Jaliya Ekanayake Community Grids Laboratory, Digital Science Center Pervasive Technology Institute.

Generative Topographic Mapping in Life Science Jong Youl Choi School of Informatics and Computing Pervasive Technology Institute Indiana University

Portable Parallel Programming on Cloud and HPC: Scientific Applications of Twister4Azure Thilina Gunarathne Bingjing Zhang, Tak-Lon.

FutureGrid Dynamic Provisioning Experiments including Hadoop Fugang Wang, Archit Kulshrestha, Gregory G. Pike, Gregor von Laszewski, Geoffrey C. Fox.

SALSASALSASALSASALSA Design Pattern for Scientific Applications in DryadLINQ CTP DataCloud-SC11 Hui Li Yang Ruan, Yuduo Zhou Judy Qiu, Geoffrey Fox.

Parallel Applications And Tools For Cloud Computing Environments Azure MapReduce Large-scale PageRank with Twister Twister BLAST Thilina Gunarathne, Stephen.

S CALABLE H IGH P ERFORMANCE D IMENSION R EDUCTION Seung-Hee Bae.

SALSA HPC Group School of Informatics and Computing Indiana University.

MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.

Twister Bingjing Zhang, Fei Teng, Yuduo Zhou Twister4Azure Thilina Gunarathne Building Virtual Cluster Towards Reproducible eScience in the Cloud Jonathan.

SALSASALSASALSASALSA FutureGrid Venus-C June Geoffrey Fox

Community Grids Lab. Indiana University, Bloomington Seung-Hee Bae.

Performance Model for Parallel Matrix Multiplication with Dryad: Dataflow Graph Runtime Hui Li School of Informatics and Computing Indiana University 11/1/2012.

Multidimensional Scaling by Deterministic Annealing with Iterative Majorization Algorithm Seung-Hee Bae, Judy Qiu, and Geoffrey Fox SALSA group in Pervasive.

Parallel Applications And Tools For Cloud Computing Environments SC 10 New Orleans, USA Nov 17, 2010.

SALSA Group’s Collaborations with Microsoft SALSA Group Principal Investigator Geoffrey Fox Project Lead Judy Qiu Scott Beason,

SALSASALSASALSASALSA Clouds Ball Aerospace March Geoffrey Fox

Twister Bingjing Zhang, Fei Teng Twister4Azure Thilina Gunarathne Building Virtual Cluster Towards Reproducible eScience in the Cloud Jonathan Klinginsmith.

Towards a Collective Layer in the Big Data Stack Thilina Gunarathne Judy Qiu

Looking at Use Case 19, 20 Genomics 1st JTC 1 SGBD Meeting SDSC San Diego March Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox

Security: systems, clouds, models, and privacy challenges iDASH Symposium San Diego CA October Geoffrey.

Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics,

SALSA Group Research Activities April 27, Research Overview  MapReduce Runtime  Twister  Azure MapReduce  Dryad and Parallel Applications 

MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.

SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu

Parallel Applications And Tools For Cloud Computing Environments CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.

Memcached Integration with Twister Saliya Ekanayake - Jerome Mitchell - Yiming Sun -

SALSASALSASALSASALSA Data Intensive Biomedical Computing Systems Statewide IT Conference October 1, 2009, Indianapolis Judy Qiu

SALSASALSA Dynamic Virtual Cluster provisioning via XCAT on iDataPlex Supports both stateful and stateless OS images iDataplex Bare-metal Nodes Linux Bare-

SALSASALSASALSASALSA IU Twister Supports Data Intensive Science Applications School of Informatics and Computing Indiana University.

SALSASALSASALSASALSA Data Intensive Biomedical Computing Systems Statewide IT Conference October 1, 2009, Indianapolis Judy Qiu

SALSA HPC Group School of Informatics and Computing Indiana University Workshop on Petascale Data Analytics: Challenges, and.

Thilina Gunarathne, Bimalee Salpitkorala, Arun Chauhan, Geoffrey Fox

MapReduce and Data Intensive Applications XSEDE’12 BOF Session

Our Objectives Explore the applicability of Microsoft technologies to real world scientific domains with a focus on data intensive applications Expect.

I590 Data Science Curriculum August

Applying Twister to Scientific Applications

Data Science Curriculum March

Biology MDS and Clustering Results

SC09 Doctoral Symposium, Portland, 11/18/2009

Scientific Data Analytics on Cloud and HPC Platforms

Scalable Parallel Interoperable Data Analytics Library

Twister4Azure : Iterative MapReduce for Azure Cloud

Adaptive Interpolation of Multidimensional Scaling

Parallel Applications And Tools For Cloud Computing Environments

Towards High Performance Data Analytics with Java

Presentation transcript:

SALSA HPC Group School of Informatics and Computing Indiana University

Twister Bingjing Zhang Funded by Microsoft Foundation Grant, Indiana University's Faculty Research Support Program and NSF OCI Grant Twister4Azure Thilina Gunarathne Funded by Microsoft Azure Grant High-Performance Visualization Algorithms For Data-Intensive Analysis Seung-Hee Bae and Jong Youl Choi Funded by NIH Grant 1RC2HG

DryadLINQ CTP Evaluation Hui Li, Yang Ruan, and Yuduo Zhou Funded by Microsoft Foundation Grant Million Sequence Challenge Saliya Ekanayake, Adam Hughs, Yang Ruan Funded by NIH Grant 1RC2HG Cyberinfrastructure for Remote Sensing of Ice Sheets Jerome Mitchell Funded by NSF Grant OCI

Linux HPC Bare-system Linux HPC Bare-system Amazon Cloud Windows Server HPC Bare-system Windows Server HPC Bare-system Virtualization CPU Nodes Virtualization Infrastructure Hardware Azure Cloud Grid Appliance GPU Nodes Cross Platform Iterative MapReduce (Collectives, Fault Tolerance, Scheduling) Kernels, Genomics, Proteomics, Information Retrieval, Polar Science Scientific Simulation Data Analysis and Management Dissimilarity Computation, Clustering, Multidimentional Scaling, Generative Topological Mapping Kernels, Genomics, Proteomics, Information Retrieval, Polar Science Scientific Simulation Data Analysis and Management Dissimilarity Computation, Clustering, Multidimentional Scaling, Generative Topological Mapping Applications Programming Model Services and Workflow High Level Language Distributed File Systems Data Parallel File System Runtime Storage Object Store Security, Provenance, Portal

GTM MDS (SMACOF) Maximize Log-Likelihood Minimize STRESS or SSTRESS Objective Function O(KN) (K << N) O(N 2 ) Complexity Non-linear dimension reduction Find an optimal configuration in a lower-dimension Iterative optimization method Purpose EM Iterative Majorization (EM-like) Optimization Method Optimization Method Vector-based data Non-vector (Pairwise similarity matrix) Input

Parallel Visualization Algorithms PlotViz

Distinction on static and variable data Configurable long running (cacheable) map/reduce tasks Pub/sub messaging based communication/data transfers Broker Network for facilitating communication

configureMaps(..) configureReduce(..) runMapReduce(..) while(condition){ } //end while updateCondition() close() Combine() operation Reduce() Map() Worker Nodes Communications/data transfers via the pub-sub broker network & direct TCP Iterations May send pairs directly Local Disk Cacheable map/reduce tasks Main program may contain many MapReduce invocations or iterative MapReduce invocations Main program’s process space

Worker Node Local Disk Worker Pool Twister Daemon Master Node Twister Driver Main Program B B B B Pub/sub Broker Network Worker Node Local Disk Worker Pool Twister Daemon Scripts perform: Data distribution, data collection, and partition file creation map reduce Cacheable tasks One broker serves several Twister daemons

Master Node Twister Driver Twister-MDS ActiveMQ Broker ActiveMQ Broker MDS Monitor PlotViz I. Send message to start the job II. Send intermediate results Client Node

Method A Hierarchical Sending Method B Improved Hierarchical Sending Method C All-to-All Sending

Twister Driver Node Twister Daemon Node ActiveMQ Broker Node Broker-Daemon Connection Broker-Broker Connection 8 Brokers and 32 Daemon Nodes in total Broker-Driver Connection

Twister Daemon Node ActiveMQ Broker Node Broker-Daemon Connection Broker-Broker Connection 8 Brokers and 32 Daemon Nodes in total Twister Driver Node Broker-Driver Connection

Twister Driver Node Twister Daemon Node ActiveMQ Broker Node 7 Brokers and 32 Daemon Nodes in total Broker-Daemon Connection Broker-Broker Connection Broker-Driver Connection

Twister Daemon Node ActiveMQ Broker Node 7 Brokers and 32 Computing Nodes in total Twister Driver Node Broker-Daemon Connection Broker-Broker Connection Broker-Driver Connection

Twister Daemon Node ActiveMQ Broker Node 5 Brokers and 4 Computing Nodes in total Twister Driver Node Broker-Daemon Connection Broker-Driver Connection

Centroids Centroid 1 Centroid 2 Centroid 3 Centroid N

Centroid 1 Twister Daemon Node ActiveMQ Broker Node Twister Driver Node Centroid 2Centroid 3Centroid 4 Centroid 1 Centroid 2 Centroid 4 Centroid 3 Centroid 1 Centroid 2 Centroid 4 Centroid 3 Centroid 1 Centroid 2 Centroid 4 Centroid 3 Centroid 1 Centroid 2 Centroid 4 Centroid 3

Twister Map Task ActiveMQ Broker Node Centroid 1 Centroid 2 Centroid 3 Centroid 4 Twister Reduce Task Centroid 1 Centroid 2 Centroid 4 Centroid 3 Centroid 1 Centroid 2 Centroid 4 Centroid 3 Centroid 1 Centroid 2 Centroid 4 Centroid 3 Centroid 1 Centroid 2 Centroid 4 Centroid 3

Distributed, highly scalable and highly available cloud services as the building blocks. Utilize eventually-consistent, high-latency cloud services effectively to deliver performance comparable to traditional MapReduce runtimes. Decentralized architecture with global queue based dynamic task scheduling Minimal management and maintenance overhead Supports dynamically scaling up and down of the compute resources. MapReduce fault tolerance

Azure Queues for scheduling, Tables to store meta-data and monitoring data, Blobs for input/output/intermediate data storage.

Performance with/without data caching Speedup gained using data cache Scaling speedup Increasing number of iterations Number of Executing Map Task Histogram Strong Scaling with 128M Data Points Weak Scaling Task Execution Time Histogram

Performance with/without data caching Speedup gained using data cache Scaling speedup Increasing number of iterations Azure Instance Type StudyNumber of Executing Map Task Histogram Weak Scaling Data Size Scaling Task Execution Time Histogram

Cap3 Sequence Assembly Smith Waterman Sequence Alignment BLAST Sequence Search

Chris Hemmerich, Adam Hughes, Yang Ruan, Aaron Buechlein, Judy Qiu, and Geoffrey Fox. Map-Reduce Expansion of the ISGA Genomic Analysis Web Server (2010) The 2nd IEEE International Conference on Cloud Computing Technology and Science ISGA Ergatis TIGR Workflow SGECondor Cloud, Other DCEs > clusters

Gene Sequences Pairwise Alignment & Distance Calculation Distance Matrix Pairwise Clustering Multi- Dimensional Scaling Visualization Cluster Indices Coordinates 3D Plot O(NxN)

Gene Sequences (N = 1 Million) Distance Matrix Interpolative MDS with Pairwise Distance Calculation Multi- Dimensional Scaling (MDS) Visualization 3D Plot Reference Sequence Set (M = 100K) N - M Sequence Set (900K) Select Referenc e Reference Coordinates x, y, z N - M Coordinates x, y, z Pairwise Alignment & Distance Calculation O(N 2 )

Input DataSize: 680k Sample Data Size: 100k Out-Sample Data Size: 580k Test Environment: PolarGrid with 100 nodes, 800 workers. 100k sample data 680k data

MPI / MPI-IO Finding K clusters for N data points Relationship is a bipartite graph (bi-graph) Represented by K-by-N matrix (K << N) Decomposition for P-by-Q compute grid Reduce memory requirement by 1/PQ A A B B C C Parallel File System Cray / Linux / Windows Cluster Parallel HDF5 ScaLAPACK GTM / GTM-Interpolation

Parallel MDS O(N 2 ) memory and computation required. – 100k data  480GB memory Balanced decomposition of NxN matrices by P-by-Q grid. – Reduce memory and computing requirement by 1/PQ Communicate via MPI primitives MDS Interpolation Finding approximate mapping position w.r.t. k- NN’s prior mapping. Per point it requires: – O(M) memory – O(k) computation Pleasingly parallel Mapping 2M in 1450 sec. – vs. 100k in sec. – 7500 times faster than estimation of the full MDS. 37 c1 c2 c3 r1 r2

Full data processing by GTM or MDS is computing- and memory-intensive Two step procedure Training : training by M samples out of N data Interpolation : remaining (N-M) out-of-samples are approximated without training n In-sample N-n Out-of-sample N-n Out-of-sample Total N data Training Interpolation Trained data Interpolated map MPI, Twister MapReduce P-1 p p

PubChem data with CTD visualization by using MDS (left) and GTM (right) About 930,000 chemical compounds are visualized as a point in 3D space, annotated by the related genes in Comparative Toxicogenomics Database (CTD) Chemical compounds shown in literatures, visualized by MDS (left) and GTM (right) Visualized 234,000 chemical compounds which may be related with a set of 5 genes of interest (ABCB1, CHRNB2, DRD2, ESR1, and F2) based on the dataset collected from major journal literatures which is also stored in Chem2Bio2RDF system.

ALU Metagenomics 30000

100K training and 2M interpolation of PubChem Interpolation MDS (left) and GTM (right)

Investigate in applicability and performance of DryadLINQ CTP to develop scientific applications. Goals: Evaluate key features and interfaces Probe parallel programming models Three applications: SW-G bioinformatics application Matrix Multiplication PageRank

Parallel algorithms for matrix multiplication Row partition Row column partition 2 dimensional block decomposition in Fox algorithm Multi core technologies PLINQ, TPL, and Thread Pool Hybrid parallel model Port multi-core to Dryad task to improve Performance Timing model for MM

Workload of SW-G, a pleasingly parallel application, is heterogeneous due to the difference in input gene sequences. Hence workload balancing becomes an issue. Two approach to alleviate it: Randomized distributed input data Partition job into finer granularity tasks

SALSA HPC Group Indiana University