Experiences Using Cloud Computing for A Scientific Workflow Application Jens Vöckler, Gideon Juve, Ewa Deelman, Mats Rynge, G. Bruce Berriman Funded by.

Slides:



Advertisements
Similar presentations
Natasha Pavlovikj, Kevin Begcy, Sairam Behera, Malachy Campbell, Harkamal Walia, Jitender S.Deogun University of Nebraska-Lincoln Evaluating Distributed.
Advertisements

Managing Workflows Within HUBzero: How to Use Pegasus to Execute Computational Pipelines Ewa Deelman USC Information Sciences Institute Acknowledgement:
Low Cost, Scalable Proteomics Data Analysis Using Amazon's Cloud Computing Services and Open Source Search Algorithms Brian D. Halligan, Ph.D. Medical.
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal,
Ewa Deelman, Optimizing for Time and Space in Distributed Scientific Workflows Ewa Deelman University.
Cloud Computing: Is It Powerful, And Can It Lead To Bankruptcy? G. Bruce Berriman Infrared Processing.
CREATING A MULTI-WAVELENGTH GALACTIC PLANE ATLAS WITH AMAZON WEB SERVICES G. Bruce Berriman, John Good IPAC, California Institute of Technolog y Ewa Deelman,
Ewa Deelman, Clouds: An Opportunity for Scientific Applications? Ewa Deelman USC Information Sciences.
Managing Workflows with the Pegasus Workflow Management System
Scientific Workflows in the Cloud Gideon Juve USC Information Sciences Institute
SC2010 Gregor von Laszewski (*) (*) Assistant Director of Cloud Computing, CGL, Pervasive Technology Institute.
Scientific Workflows in the Cloud Gideon Juve USC Information Sciences Institute
A Very Brief Introduction To Cloud Computing Jens Vöckler, Gideon Juve, Ewa Deelman, G. Bruce Berriman.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Portable Resource Management for Data Intensive Workflows Douglas Thain University of Notre Dame.
Authors: Weiwei Chen, Ewa Deelman 9th International Conference on Parallel Processing and Applied Mathmatics 1.
LARGE SCALE DEPLOYMENT OF DAP AND DTS Rob Kooper Jay Alemeda Volodymyr Kindratenko.
Distributed FutureGrid Clouds for Scalable Collaborative Sensor-Centric Grid Applications For AMSA TO 4 Sensor Grid Technical Interchange Meeting By Anabas,
DISTRIBUTED COMPUTING
Presented by: Sanketh Beerabbi University of Central Florida COP Cloud Computing.
Running Scientific Workflow Applications on the Amazon EC2 Cloud Bruce Berriman NASA Exoplanet Science Institute, IPAC Gideon Juve, Ewa Deelman, Karan.
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
Managing large-scale workflows with Pegasus Karan Vahi ( Collaborative Computing Group USC Information Sciences Institute Funded.
FutureGrid Dynamic Provisioning Experiments including Hadoop Fugang Wang, Archit Kulshrestha, Gregory G. Pike, Gregor von Laszewski, Geoffrey C. Fox.
Large Scale Sky Computing Applications with Nimbus Pierre Riteau Université de Rennes 1, IRISA INRIA Rennes – Bretagne Atlantique Rennes, France
Dr. Ahmed Abdeen Hamed, Ph.D. University of Vermont, EPSCoR Research on Adaptation to Climate Change (RACC) Burlington Vermont USA MODELING THE IMPACTS.
Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing.
FutureGrid Cyberinfrastructure for Computational Research.
SALSASALSASALSASALSA FutureGrid Venus-C June Geoffrey Fox
Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute
Intermediate Condor: Workflows Rob Quick Open Science Grid Indiana University.
Clouds in Bioinformatics Rob Knight HHMI and University of Colorado at Boulder.
Distributed FutureGrid Clouds for Scalable Collaborative Sensor-Centric Grid Applications For AMSA TO 4 Sensor Grid Technical Interchange Meeting March.
Pegasus WMS: Leveraging Condor for Workflow Management Ewa Deelman, Gaurang Mehta, Karan Vahi, Gideon Juve, Mats Rynge, Prasanth.
Experiment Management from a Pegasus Perspective Jens-S. Vöckler Ewa Deelman
Tony Doyle - University of Glasgow Introduction. Tony Doyle - University of Glasgow 6 November 2006ScotGrid Expression of Interest Universities of Aberdeen,
A Fully Automated Fault- tolerant System for Distributed Video Processing and Off­site Replication George Kola, Tevfik Kosar and Miron Livny University.
Grid Appliance The World of Virtual Resource Sharing Group # 14 Dhairya Gala Priyank Shah.
State of LSC Data Analysis and Software LSC Meeting LIGO Hanford Observatory November 11 th, 2003 Kent Blackburn, Stuart Anderson, Albert Lazzarini LIGO.
Workflows Description, Enactment and Monitoring in SAGA Ashiq Anjum, UWE Bristol Shantenu Jha, LSU 1.
Funded by the NSF OCI program grants OCI and OCI Mats Rynge, Gideon Juve, Karan Vahi, Gaurang Mehta, Ewa Deelman Information Sciences Institute,
Bulk Data Transfer Activities We regard data transfers as “first class citizens,” just like computational jobs. We have transferred ~3 TB of DPOSS data.
Globus.org/genomics Globus Galaxies Science Gateways as a Service Ravi K Madduri, University of Chicago and Argonne National Laboratory
Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.
Evaluating Clouds for Smart Grid Computing: early Results using GE MARS App Ketan Maheshwari
1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part IV Workflow Mapping and Execution in Pegasus (Thanks.
Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute
Campus Grid Technology Derek Weitzel University of Nebraska – Lincoln Holland Computing Center (HCC) Home of the 2012 OSG AHM!
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
HUBzero® Platform for Scientific Collaboration Copyright © 2012 HUBzero Foundation, LLC International Workshop on Science Gateways, ETH Zürich, June 3-5,
HPC In The Cloud Case Study: Proteomics Workflow
Accessing the VI-SEEM infrastructure
Pegasus WMS Extends DAGMan to the grid world
AWS Integration in Distributed Computing
Cloudy Skies: Astronomy and Utility Computing
Seismic Hazard Analysis Using Distributed Workflows
Provisioning 160,000 cores with HEPCloud at SC17
Workflows and the Pegasus Workflow Management System
Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI.
Clouds: An Opportunity for Scientific Applications?
University of Southern California
Specialized Cloud Mechanisms
Genre1: Condor Grid: CSECCR
Pegasus Workflows on XSEDE
Mats Rynge USC Information Sciences Institute
rvGAHP – Push-Based Job Submission Using Reverse SSH Connections
High Throughput Computing for Astronomers
A General Approach to Real-time Workflow Monitoring
Frieda meets Pegasus-WMS
Presentation transcript:

Experiences Using Cloud Computing for A Scientific Workflow Application Jens Vöckler, Gideon Juve, Ewa Deelman, Mats Rynge, G. Bruce Berriman Funded by NSF grant OC

2 ScienceCloud’ This Talk Experience in cloud computing talk FutureGrid: Hardware Middlewares Pegasus-WMS Periodograms Experiments Periodogram I Comparison of clouds using periodograms Periodogram II

3 ScienceCloud’ What is FutureGrid Something Different For Everyone Test bed for Cloud Computing (this talk). 6 centers across the nation Nimbus Eucalyptus Moab “bare metal” Start here:

4 ScienceCloud’ What Comprises FutureGrid Proposed: 16 x (192 GB + 12 TB / node) cluster 8 node GPU-enhanced cluster

5 ScienceCloud’ Middlewares in FG Available resources as of

6 ScienceCloud’ Pegasus WMS I Automating Computational Pipelines Funded by NSF/OCI, is a collaboration with the Condor group at UW Madison Automates data management Captures provenance information Used by a number of domains Across a variety of applications Scalability Handle large data (kB…TB), and Many computations (1…10 6 tasks)

7 ScienceCloud’ Pegasus WMS II Reliability Retry computations from point of failure Construction of complex workflows Based on computational blocks Portable, reusable WF descr. Can run pure locally, or Distributed among institutions Laptop, campus cluster, grid, cloud

8 ScienceCloud’ How Pegasus Uses FutureGrid Focus on Eucalyptus and Nimbus No Moab “bare metal” at this point During Experiments in Nov’ Nimbus cores 744 Eucalyptus cores 1,288 total potential cores across 4 clusters in 5 clouds. Actually used 300 physical cores (max).

9 ScienceCloud’ Pegasus FG Interaction

10 ScienceCloud’ Periodograms Find extra-solar planets by Wobbles in radial velocity of star, or Dips in star’s intensity Planet Star Light Curve Time Brightness Planet Star Time Red Blue

11 ScienceCloud’ Kepler Workflow 210k light-curves released in July 2010 Apply 3 algorithms to each curve Run entire data-set 3 times, with 3 different parameter sets This talk’s experiments: 1 algorithm, 1 parameter set, 1 run Either partial or full data-set

12 ScienceCloud’ Pegasus Periodograms 1 st experiment is a “ramp-up” Try to see where things trip 16k light curves 33k computations (every light-curve twice) Already found places needing adjustments 2 nd experiment also 16k light curves Across 3 comparable infrastructures 3 rd experiment runs full set Testing hypothesized tunings

13 ScienceCloud’ Periodogram Workflow

14 ScienceCloud’ Excerpt: Jobs over Time

15 ScienceCloud’ Hosts, Tasks, and Duration (I)

16 ScienceCloud’ Resource- and Job States (I)

17 ScienceCloud’ Cloud Comparison Compare academic and commercial clouds NERSC’s Magellan cloud (Eucalyptus) Amazon’s cloud (EC2), and FutureGrid’s sierra cloud (Eucalyptus) Constrained node- and core selection Because AWS costs $$ 6 nodes, 8 cores each node 1 Condor slot / physical CPU

18 ScienceCloud’ Cloud Comparison II Given 48 physical cores Speed-up ≈ 43 considered pretty good AWS cost ≈ $ h x 6 x c1.large ≈ $ GB in GB out ≈ $2 SiteCPURAM (SW)WalltimeCum. Dur.Speed-Up Magellan8 x 2.6 GHz19 (0) GB5.2 h226.6 h43.6 Amazon8 x 2.3 GHz7 (0) GB7.2 h295.8 h41.1 FutureGrid8 x 2.5 GHz29 (½) GB5.7 h248.0 h43.5

19 ScienceCloud’ Scaling Up I Workflow optimizations Pegasus clustering ✔ Compress file transfers Submit-host Unix settings Increase open file-descriptors limit Increase firewall’s open port range Submit-host Condor DAGMan settings Idle job limit ✔

20 ScienceCloud’ Scaling Up II Submit-host Condor settings Socket cache size increase File descriptors and ports per daemon Using condor_shared_port daemon Remote VM Condor settings Use CCB for private networks Tune Condor job slots TCP for collector call-backs

21 ScienceCloud’ Hosts, Tasks, and Duration (II)

22 ScienceCloud’ Resource- and Job States (II)

23 ScienceCloud’ Lose Ends Saturate requested resources Clustering Better submit host tuning Requires better monitoring ✔ Better data staging

24 ScienceCloud’ Acknowledgements Funded by NSF grant OC Ewa Deelman, Gideon Juve, Mats Rynge, Bruce Berriman FG help desk ;-)