Portable Resource Management for Data Intensive Workflows Douglas Thain University of Notre Dame.

Slides:

Advertisements

Similar presentations

Lobster: Personalized Opportunistic Computing for CMS at Large Scale Douglas Thain (on behalf of the Lobster team) University of Notre Dame CVMFS Workshop,

Advertisements

University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.

BXGrid: A Data Repository and Computing Grid for Biometrics Research Hoang Bui University of Notre Dame 1.

SLA-Oriented Resource Provisioning for Cloud Computing

Experience with Adopting Clouds at Notre Dame Douglas Thain University of Notre Dame IEEE CloudCom, November 2010.

CyberShake Project and ShakeMaps. CyberShake Project CyberShake is a SCEC research project that is a physics-based high performance computational approach.

DV/dt : Accelerating the Rate of Progress Towards Extreme Scale Collaborative Science Miron Livny (UW), Ewa Deelman (USC/ISI), Douglas Thain (ND), Frank.

The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.

Scaling Up Without Blowing Up Douglas Thain University of Notre Dame.

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

1 Condor Compatible Tools for Data Intensive Computing Douglas Thain University of Notre Dame Condor Week 2011.

1 High Performance Computing at SCEC Scott Callaghan Southern California Earthquake Center University of Southern California.

Programming Distributed Systems with High Level Abstractions Douglas Thain University of Notre Dame Cloud Computing and Applications (CCA-08) University.

Deconstructing Clusters for High End Biometric Applications NSF CCF June Douglas Thain and Patrick Flynn University of Notre Dame 5 August.

Cooperative Computing for Data Intensive Science Douglas Thain University of Notre Dame NSF Bridges to Engineering 2020 Conference 12 March 2008.

The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.

Building Scalable Elastic Applications using Makeflow Dinesh Rajan and Douglas Thain University of Notre Dame Tutorial at CCGrid, May Delft, Netherlands.

Building Scalable Scientific Applications using Makeflow Dinesh Rajan and Peter Sempolinski University of Notre Dame.

Introduction to Makeflow and Work Queue CSE – Cloud Computing – Fall 2014 Prof. Douglas Thain.

Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.

Design of an Active Storage Cluster File System for DAG Workflows Patrick Donnelly and Douglas Thain University of Notre Dame 2013 November 18 th DISCS-2013.

Authors: Weiwei Chen, Ewa Deelman 9th International Conference on Parallel Processing and Applied Mathmatics 1.

Elastic Applications in the Cloud Dinesh Rajan University of Notre Dame CCL Workshop, June 2012.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Programming Distributed Systems with High Level Abstractions Douglas Thain University of Notre Dame 23 October 2008.

Toward a Common Model for Highly Concurrent Applications Douglas Thain University of Notre Dame MTAGS Workshop 17 November 2013.

Experiences Using Cloud Computing for A Scientific Workflow Application Jens Vöckler, Gideon Juve, Ewa Deelman, Mats Rynge, G. Bruce Berriman Funded by.

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.

Building Scalable Scientific Applications with Makeflow Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts University.

Building Scalable Scientific Applications using Makeflow Dinesh Rajan and Douglas Thain University of Notre Dame.

The Cooperative Computing Lab  We collaborate with people who have large scale computing problems in science, engineering, and other fields.  We operate.

Distributed Framework for Automatic Facial Mark Detection Graduate Operating Systems-CSE60641 Nisha Srinivas and Tao Xu Department of Computer Science.

DV/dt - Accelerating the Rate of Progress towards Extreme Scale Collaborative Science DOE: Scientific Collaborations at Extreme-Scales:

Experiment Management from a Pegasus Perspective Jens-S. Vöckler Ewa Deelman

Analyzing LHC Data on 10K Cores with Lobster and Work Queue Douglas Thain (on behalf of the Lobster Team)

Introduction to Scalable Programming using Work Queue Dinesh Rajan and Mike Albrecht University of Notre Dame October 24 and November 7, 2012.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Experiences Running Seismic Hazard Workflows Scott Callaghan Southern California Earthquake Center University of Southern California SC13 Workflow BoF.

Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.

Funded by the NSF OCI program grants OCI and OCI Mats Rynge, Gideon Juve, Karan Vahi, Gaurang Mehta, Ewa Deelman Information Sciences Institute,

Building Scalable Elastic Applications using Work Queue Dinesh Rajan and Douglas Thain University of Notre Dame Tutorial at CCGrid, May Delft,

Demonstration of Scalable Scientific Applications Peter Sempolinski and Dinesh Rajan University of Notre Dame.

Southern California Earthquake Center CyberShake Progress Update November 3, 2014 – 4 May 2015 UGMS May 2015 Meeting Philip Maechling SCEC IT Architect.

Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.

Next Generation of Apache Hadoop MapReduce Owen

1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part IV Workflow Mapping and Execution in Pegasus (Thanks.

PEER 2003 Meeting 03/08/031 Interdisciplinary Framework Major focus areas Structural Representation Fault Systems Earthquake Source Physics Ground Motions.

Northwest Indiana Computational Grid Preston Smith Rosen Center for Advanced Computing Purdue University - West Lafayette West Lafayette Calumet.

Introduction to Makeflow and Work Queue Nicholas Hazekamp and Ben Tovar University of Notre Dame XSEDE 15.

Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.

INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.

Overview of Scientific Workflows: Why Use Them?

Cloudy Skies: Astronomy and Utility Computing

Simplify Your Science with Workflow Tools

Seismic Hazard Analysis Using Distributed Workflows

Scaling Up Scientific Workflows with Makeflow

Scott Callaghan Southern California Earthquake Center

Workflows and the Pegasus Workflow Management System

CyberShake Study 16.9 Discussion

Integration of Singularity With Makeflow

Introduction to Makeflow and Work Queue

Haiyan Meng and Douglas Thain

University of Southern California

BXGrid: A Data Repository and Computing Grid for Biometrics Research

Pegasus Workflows on XSEDE

What’s New in Work Queue

Overview of Workflows: Why Use Them?

rvGAHP – Push-Based Job Submission Using Reverse SSH Connections

High Throughput Computing for Astronomers

Presentation transcript:

Portable Resource Management for Data Intensive Workflows Douglas Thain University of Notre Dame

The Cooperative Computing Lab University of Notre Dame

The Cooperative Computing Lab We collaborate with people who have large scale computing problems in science, engineering, and other fields. We operate computer systems on the O(10,000) cores: clusters, clouds, grids. We conduct computer science research in the context of real people and problems. We release open source software for large scale distributed computing. 3

A Familiar Problem X 100

What actually happens: 1 TB GPU 3M files of 1K each 3M files of 1K each 128 GB X 100

Some reasonable questions: Will this workload at all on machine X? How many workloads can I run simultaneously without running out of storage space? Did this workload actually behave as expected when run on a new machine? How is run X different from run Y? If my workload wasn’t able to run on this machine, where can I run it?

End users have no idea what resources their applications actually need. and… Computer systems are terrible at describing their capabilities and limits. and… They don’t know when to say NO.

dV/dt : Accelerating the Rate of Progress Towards Extreme Scale Collaborative Science Miron Livny (UW), Ewa Deelman (USC/ISI), Douglas Thain (ND), Frank Wuerthwein (UCSD), Bill Allcock (ANL) … make it easier for scientists to conduct large- scale computational tasks that use the power of computing resources they do not own to process data they did not collect with applications they did not develop …

dV/dt Project Approach Identify challenging applications. Develop a framework that allows to characterize the application needs, the resource availability, and plan for their use. Threads of Research: – High level planning algorithms. – Measurement, representation, analysis. – Resource allocation and enforcement. – Resources: Storage, networks, memory, cores…? – Evaluate on major DOE resources: OSG and ALCF.

Stages of Resource Management Estimate the application resource needs Find the appropriate computing resources Acquire those resources Deploy applications and data on the resources Manage applications and resources during run. Can we do it for one task? How about an app composed of many tasks?

B1 B2 B3 A1A2A3 F Regular Graphs Irregular Graphs A 1 B C D E 8910 A Dynamic Workloads while( more work to do) { foreach work unit { t = create_task(); submit_task(t); } t = wait_for_task(); process_result(t); } Static Workloads Concurrent Workloads Categories of Applications FFF F FF FF

Bioinformatics Portal Generates Workflows for Makeflow 13 BLAST (Small) 17 sub-tasks ~4h on 17 nodes BWA 825 sub-tasks ~27m on 100 nodes SHRIMP 5080 sub-tasks ~3h on 200 nodes

Periodograms: generate an atlas of extra-solar planets Find extra-solar planets by –Wobbles in radial velocity of star, or –Dips in star’s intensity Planet Star Light Curve Time Brightness 210k light-curves released in July 2010 Apply 3 algorithms to each curve 3 different parameter sets 210K input, 630K output files 1 super-workflow 40 sub-workflows ~5,000 tasks per sub-workflow 210K tasks total Pegasus managed workflows

CyberShake PSHA Workflow 239 Workflows Each site in the input map corresponds to one workflow Each workflow has:  820,000 tasks  Description  Builders ask seismologists: “What will the peak ground motion be at my new building in the next 50 years?”  Seismologists answer this question using Probabilistic Seismic Hazard Analysis (PSHA) Southern California Earthquake Center MPI codes ~ 12,000 CPU hours, Post Processing 2,000 CPU hours Data footprint ~ 800GB Pegasus managed workflows Workflow Ensembles

Task Characterization/Execution Understand the resource needs of a task Establish expected values and limits for task resource consumption Launch tasks on the correct resources Monitor task execution and resource consumption, interrupt tasks that reach limits Possibly re-launch task on different resources

Data Collection and Modeling RAM: 50M Disk: 1G CPU: 4 C RAM: 50M Disk: 1G CPU: 4 C monitor task workflow typ max min P RAM A A B B C C D D E E F F Workflow Schedule A A C C F F B B D D E E Workflow Structure Workflow Profile Task Profile Records From Many Tasks Task Record RAM: 50M Disk: 1G CPU: 4 C RAM: 50M Disk: 1G CPU: 4 C RAM: 50M Disk: 1G CPU: 4 C RAM: 50M Disk: 1G CPU: 4 C RAM: 50M Disk: 1G CPU: 4 C RAM: 50M Disk: 1G CPU: 4 C

Resource Monitor RM Summary File start: end: exit_type: normal exit_status: 0 max_concurrent_processes: 16 wall_time: cpu_time: virtual_memory: resident_memory: swap_memory: 0 bytes_read: bytes_written: Log File: #wall_clock(useconds)concurrent_processes cpu_time(useconds) virtual_memory(kB)resident_memory(kB) swap_memory(kB) bytes_readbytes_written Local Process Tree % resource_monitor mysim.exe

Monitoring Strategies SummariesSnapshotEvents getrusage and times Reading /proc and measuring disk at given intervals. Linker wrapper to libc Available only at the end of a process. Blind while waiting for next interval. Fragile to modifications of the environment, no statically linked processes. IndirectDirect Monitor how the world changes while the process tree is alive. Monitor what functions, and with which arguments the process tree is calling.

Portable Resource Management Work Queue Work Queue while( more work to do) { foreach work unit { t = create_task(); submit_task(t); } t = wait_for_task(); process_result(t); } RM Task W W W W W W task 1 details: cpu, ram, disk task 2 details: cpu, ram, disk task 3 details: cpu, ram, disk task 1 details: cpu, ram, disk task 2 details: cpu, ram, disk task 3 details: cpu, ram, disk Pegasus RM Task Job-1.res Job-2.res job-3.res Makeflow other batch system RM Task rule-1.res Jrule2.res rule-3.res

Resource Visualization of SHRiMP

Outliers Happen: BWA Example

Completing the Cycle task typ max min P RAM CPU: 10s RAM: 16GB DISK: 100GB task RM Allocate Resources Historical Repository CPU: 5s RAM: 15GB DISK: 90GB Observed Resources Measurement and Enforcement Exception Handling Is it an outlier?

Multi-Party Resource Management WF Coord Batch System Storage Allocator SDN

Application to Work Queue Application Logic Work Queue Library Campus Condor Pool Amazon EC2 Campus HPC Cluster while( more work to do) { foreach work unit { t = create_task(); submit_task(t); } t = wait_for_task(); process_result(t); } WWWW W WW W W $ $ $

Coming up soon in CCTools… Makeflow – Integration with resource management. – Built-in linker pulls in deps to make a portable package. Work Queue – Hierarchy, multi-slot workers, cluster caching. – Automatic scaling of workers with network capacity. Parrot – Integration with CVMFS for CMS and (almost?) ATLAS. – Continuous improvement of syscall support. Chirp – Support for HDFS as a storage backend. – Neat feature: search() system call.

Acknowledgements 27 CCL Graduate Students: Michael Albrecht Patrick Donnelly Dinesh Rajan Casey Robinson Peter Sempolinski Li Yu dV/dT Project PIs Bill Allcock (ALCF) Ewa Deelman (USC) Miron Livny (UW) Frank Weurthwein (UCSD) CCL Staff Ben Tovar

The Cooperative Computing Lab University of Notre Dame