SALSASALSA Dynamic Virtual Cluster provisioning via XCAT on iDataPlex Supports both stateful and stateless OS images iDataplex Bare-metal Nodes Linux Bare-

Slides:



Advertisements
Similar presentations
SALSASALSA Twister: A Runtime for Iterative MapReduce Jaliya Ekanayake Community Grids Laboratory, Digital Science Center Pervasive Technology Institute.
Advertisements

SALSA HPC Group School of Informatics and Computing Indiana University.
SALSASALSASALSASALSA Applying Twister for Scientific Applications NSF Cloud PI Workshop March 17, 2011 Judy Qiu School of Informatics.
Future Grid Early Projects 2010 User Advisory Board Meeting Pittsburgh, PA.
Twister4Azure Iterative MapReduce for Windows Azure Cloud Thilina Gunarathne Indiana University Iterative MapReduce for Azure Cloud.
Hybrid MapReduce Workflow Yang Ruan, Zhenhua Guo, Yuduo Zhou, Judy Qiu, Geoffrey Fox Indiana University, US.
SALSASALSASALSASALSA Using MapReduce Technologies in Bioinformatics and Medical Informatics Computing for Systems and Computational Biology Workshop SC09.
SALSASALSASALSASALSA Chemistry in the Digital Age Workshop, Penn State University, June 11, 2009 Geoffrey Fox
SALSASALSASALSASALSA Using Cloud Technologies for Bioinformatics Applications MTAGS Workshop SC09 Portland Oregon November Judy Qiu
SALSASALSASALSASALSA Large Scale DNA Sequence Analysis and Biomedical Computing using MapReduce, MPI and Threading Workshop on Enabling Data-Intensive.
Interpolative Multidimensional Scaling Techniques for the Identification of Clusters in Very Large Sequence Sets April 27, 2011.
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
Clouds Cyberinfrastructure and Collaboration CTS2010 Chicago IL May Geoffrey Fox
Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.
MapReduce in the Clouds for Science CloudCom 2010 Nov 30 – Dec 3, 2010 Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox {tgunarat, taklwu,
Scalable Parallel Computing on Clouds Thilina Gunarathne Advisor : Prof.Geoffrey Fox Committee : Prof.Judy Qiu,
Dimension Reduction and Visualization of Large High-Dimensional Data via Interpolation Seung-Hee Bae, Jong Youl Choi, Judy Qiu, and Geoffrey Fox School.
SALSASALSASALSASALSA Digital Science Center June 25, 2010, IIT Geoffrey Fox Judy Qiu School.
SALSASALSA Programming Abstractions for Multicore Clouds eScience 2008 Conference Workshop on Abstractions for Distributed Applications and Systems December.
SALSASALSASALSASALSA Performance Analysis of High Performance Parallel Applications on Virtualized Resources Jaliya Ekanayake and Geoffrey Fox Indiana.
SALSASALSASALSASALSA High Performance Biomedical Applications Using Cloud Technologies HPC and Grid Computing in the Cloud Workshop (OGF27 ) October 13,
Panel Session The Challenges at the Interface of Life Sciences and Cyberinfrastructure and how should we tackle them? Chris Johnson, Geoffrey Fox, Shantenu.
SALSASALSASALSASALSA Hybrid Cloud and Cluster Computing Paradigms for Scalable Data Intensive Applications April 15, 2011 University of Alabama Judy Qiu.
SALSASALSASALSASALSA AOGS, Singapore, August 11-14, 2009 Geoffrey Fox 1,2 and Marlon Pierce 1
Science in Clouds SALSA Team salsaweb/salsa Community Grids Laboratory, Digital Science Center Pervasive Technology Institute Indiana University.
SALSASALSASALSASALSA Proposal Review Meeting with CTSI Translating Research Into Practice Project Development Team, July 8, 2009, IUPUI Gil Liu, Judy Qiu,
MapReduce TG11 BOF FutureGrid Team (Geoffrey Fox) TG11 19 July 2011 Downtown Marriott Salt Lake City.
Experimenting with FutureGrid CloudCom 2010 Conference Indianapolis December Geoffrey Fox
SALSASALSA Twister: A Runtime for Iterative MapReduce Jaliya Ekanayake Community Grids Laboratory, Digital Science Center Pervasive Technology Institute.
SALSASALSASALSASALSA Cloud Technologies and Their Applications March 26, 2010 Indiana University Bloomington Judy Qiu
Biomedical Cloud Computing iDASH Symposium San Diego CA May Geoffrey Fox
Portable Parallel Programming on Cloud and HPC: Scientific Applications of Twister4Azure Thilina Gunarathne Bingjing Zhang, Tak-Lon.
FutureGrid Dynamic Provisioning Experiments including Hadoop Fugang Wang, Archit Kulshrestha, Gregory G. Pike, Gregor von Laszewski, Geoffrey C. Fox.
Future Grid FutureGrid Overview Geoffrey Fox SC09 November
SALSASALSASALSASALSA MSR Internship – Final Presentation Jaliya Ekanayake School of Informatics and Computing Indiana University.
SALSASALSASALSASALSA Design Pattern for Scientific Applications in DryadLINQ CTP DataCloud-SC11 Hui Li Yang Ruan, Yuduo Zhou Judy Qiu, Geoffrey Fox.
Parallel Applications And Tools For Cloud Computing Environments Azure MapReduce Large-scale PageRank with Twister Twister BLAST Thilina Gunarathne, Stephen.
SALSASALSASALSASALSA CloudComp 09 Munich, Germany Jaliya Ekanayake, Geoffrey Fox School of Informatics and Computing Pervasive.
SALSA HPC Group School of Informatics and Computing Indiana University.
Cloud Technologies and Data Intensive Applications INGRID 2010 Workshop Poznan Poland May Geoffrey Fox
Cloud Technologies and Data Intensive Applications INGRID 2010 Workshop Poznan Poland May Geoffrey Fox
SALSASALSASALSASALSA FutureGrid Venus-C June Geoffrey Fox
SALSASALSASALSASALSA Scalable Programming and Algorithms for Data Intensive Life Science Applications Data Intensive Seattle, WA Judy Qiu
FutureGrid TeraGrid Science Advisory Board San Diego CA July Geoffrey Fox
MPI and MapReduce CCGSC 2010 Flat Rock NC September Geoffrey Fox
Hosting Cloud, HPC and Grid Educational Activities on FutureGrid Renato Figueiredo – U. of Florida Geoffrey Fox, Barbara Ann O’Leary – Indiana University.
SALSA Group’s Collaborations with Microsoft SALSA Group Principal Investigator Geoffrey Fox Project Lead Judy Qiu Scott Beason,
SALSASALSASALSASALSA Clouds Ball Aerospace March Geoffrey Fox
SALSA HPC Group School of Informatics and Computing Indiana University.
Clouds will win! CTS Conference 2011 Philadelphia May Geoffrey Fox
Performance of MapReduce on Multicore Clusters
Security: systems, clouds, models, and privacy challenges iDASH Symposium San Diego CA October Geoffrey.
Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics,
SALSA Group Research Activities April 27, Research Overview  MapReduce Runtime  Twister  Azure MapReduce  Dryad and Parallel Applications 
SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu
Memcached Integration with Twister Saliya Ekanayake - Jerome Mitchell - Yiming Sun -
SALSASALSASALSASALSA Data Intensive Biomedical Computing Systems Statewide IT Conference October 1, 2009, Indianapolis Judy Qiu
SALSASALSASALSASALSA Cloud Technologies for Data Intensive Biomedical Computing OGF27 Workshop October 13, 2009, Banff Judy Qiu
SALSASALSASALSASALSA IU Twister Supports Data Intensive Science Applications School of Informatics and Computing Indiana University.
SALSASALSASALSASALSA Data Intensive Biomedical Computing Systems Statewide IT Conference October 1, 2009, Indianapolis Judy Qiu
SALSA HPC Group School of Informatics and Computing Indiana University Workshop on Petascale Data Analytics: Challenges, and.
Early Experience with Cloud Technologies
Our Objectives Explore the applicability of Microsoft technologies to real world scientific domains with a focus on data intensive applications Expect.
Assignment 0 (5 points; Due Jan. 15, 2017)
SC09 Doctoral Symposium, Portland, 11/18/2009
FutureGrid Cloud Technologies and Bioinformatics Applications
Clouds from FutureGrid’s Perspective
FutureGrid and Applications
Towards High Performance Data Analytics with Java
Iterative and non-Iterative Computations
Presentation transcript:

SALSASALSA Dynamic Virtual Cluster provisioning via XCAT on iDataPlex Supports both stateful and stateless OS images iDataplex Bare-metal Nodes Linux Bare- system Linux Virtual Machines Windows Server 2008 HPC Bare-system Windows Server 2008 HPC Bare-system Virtualization Microsoft DryadLINQ / Twister / MPI Apache Hadoop / Twister/ MPI Smith Waterman Dissimilarities, CAP-3 Gene Assembly, PhyloD Using DryadLINQ, High Energy Physics, Clustering, Multidimensional Scaling, Generative Topological Mapping XCAT Infrastructure Xen Virtualization Applications Runtimes Infrastructure software Hardware Windows Server 2008 HPC Science Cloud (Dynamic Virtual Cluster) Architecture Services and Workflow

SALSASALSA Switchable clusters on the same hardware (~5 minutes between different OS such as Linux+Xen to Windows+HPCS) Support for virtual clusters SW-G : Smith Waterman Gotoh Dissimilarity Computation as an pleasingly parallel problem suitable for MapReduce style applications Pub/Sub Broker Network Summarizer Switcher Monitoring Interface iDataplex Bare- metal Nodes XCAT Infrastructure Virtual/Physical Clusters Monitoring & Control Infrastructure iDataplex Bare-metal Nodes (32 nodes) iDataplex Bare-metal Nodes (32 nodes) XCAT Infrastructure Linux Bare- system Linux Bare- system Linux on Xen Windows Server 2008 Bare-system SW-G Using Hadoop SW-G Using DryadLINQ Monitoring Infrastructure Dynamic Cluster Architecture SALSAHPC Dynamic Virtual Cluster on FutureGrid -- Demo at SC09 Demonstrate the concept of Science on Clouds on FutureGrid

SALSASALSA SALSAHPC Dynamic Virtual Cluster on FutureGrid -- Demo at SC09 Top: 3 clusters are switching applications on fixed environment. Takes approximately 30 seconds. Bottom: Cluster is switching between environments: Linux; Linux +Xen; Windows + HPCS. Takes approxomately 7 minutes SALSAHPC Demo at SC09. This demonstrates the concept of Science on Clouds using a FutureGrid iDataPlex. Demonstrate the concept of Science on Clouds using a FutureGrid cluster

SALSASALSASALSASALSA Streaming based communication Intermediate results are directly transferred from the map tasks to the reduce tasks – eliminates local files Cacheable map/reduce tasks Static data remains in memory Combine phase to combine reductions User Program is the composer of MapReduce computations Extends the MapReduce model to iterative computations Data Split D MR Driver User Program Pub/Sub Broker Network D File System M R M R M R M R Worker Nodes M R D Map Worker Reduce Worker MRDeamon Data Read/Write Communication Reduce (Key, List ) Iterate Map(Key, Value) Combine (Key, List ) User Program Close() Configure() Static data Static data δ flow Different synchronization and intercommunication mechanisms used by the parallel runtimes

SALSASALSA Twister New Release

SALSASALSASALSASALSA University of Arkansas Indiana University University of California at Los Angeles Penn State Iowa State Univ.Illinois at Chicago University of Minnesota Michigan State Notre Dame University of Texas at El Paso IBM Almaden Research Center Washington University San Diego Supercomputer Center University of Florida Johns Hopkins July 26-30, 2010 NCSA Summer School Workshop Students learning about Twister & Hadoop MapReduce technologies, supported by FutureGrid.

SALSASALSA Pair wise Sequence Comparison using Smith Waterman Gotoh Typical MapReduce computation Comparable efficiencies Twister performs the best Xiaohong Qiu, Jaliya Ekanayake, Scott Beason, Thilina Gunarathne, Geoffrey Fox, Roger Barga, Dennis Gannon “Cloud Technologies for Bioinformatics Applications”, Proceedings of the 2nd ACM Workshop on Many-Task Computing on Grids and Supercomputers (SC09), Portland, Oregon, November 16th, 2009 “Cloud Technologies for Bioinformatics ApplicationsSC09 Jaliya Ekanayake, Thilina Gunarathne, Xiaohong Qiu, “Cloud Technologies for Bioinformatics Applications”, invited paper submitted to the Journal of IEEE Transactions on Parallel and Distributed Systems (under review).“Cloud Technologies for Bioinformatics Applications

SALSASALSA Sequence Assembly in the Clouds Cap3 parallel efficiencyCap3 – Per core per file (458 reads in each file) time to process sequences Input files (FASTA) Output files CAP3 CAP3 - Expressed Sequence Tagging Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, and Geoffrey Fox, “Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications”, March 21, Proceedings of Emerging Computational Methods for the Life Sciences Workshop of ACM HPDC 2010 conference, Chicago, Illinois, June 20-25, 2010.Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications

SALSASALSA Acknowledgements SALSA Group Judy Qiu, Adam Hughes, Ryan Hartman, Geoffrey Fox Jaliya Ekanayake, Thilina Gunarathne, Jong Youl Choi, Seung-Hee Bae, Yang Ruan, Hui Li, Bingjing Zhang, Saliya Ekanayake, Stephen Wu Collaborators Yves Brun, Peter Cherbas, Dennis Fortenberry, Roger Innes, David Nelson, Homer Twigg, Craig Stewart, Haixu Tang, Mina Rho, David Wild, Bin Cao, Qian Zhu, Maureen Biggers, Gilbert Liu, Neil Devadasan Support by Research Technologies of UITS and School of Informatics and Computing