Case Study 2: Scientific Workflow for Computational Economics Tiberiu Stef-Praun Gabriel Madeira Ian Foster Robert Townsend.

Slides:



Advertisements
Similar presentations
Distributed Systems Architectures
Advertisements

Enabling Cost-Effective Resource Leases with Virtual Machines Borja Sotomayor University of Chicago Ian Foster Argonne National Laboratory/
Case Study 1: Data Replication for LIGO Scott Koranda Ann Chervenak.
National Institute of Advanced Industrial Science and Technology Ninf-G - Core GridRPC Infrastructure Software OGF19 Yoshio Tanaka (AIST) On behalf.
NGS computation services: API's,
INTRODUCTION TO SIMULATION WITH OMNET++ José Daniel García Sánchez ARCOS Group – University Carlos III of Madrid.
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Database System Concepts and Architecture
Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle.
GLOBUS PLUG-IN FOR WINGS WOKFLOW ENGINE Elizabeth Martí ITACA Universidad Politécnica de Valencia
A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Microsoft Research Faculty Summit Ian Foster Computation Institute University of Chicago & Argonne National Laboratory.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Software Issues Derived from Dr. Fawcett’s Slides Phil Pratt-Szeliga Fall 2009.
Swift: A Scientist’s Gateway to Campus Clusters, Grids and Supercomputers Swift project: Presenter contact:
Sergey Belov, Tatiana Goloskokova, Vladimir Korenkov, Nikolay Kutovskiy, Danila Oleynik, Artem Petrosyan, Roman Semenov, Alexander Uzhinskiy LIT JINR The.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Scientific Workflows on the Grid. Goals Enhance scientific productivity through: Discovery and application of datasets and programs at petabyte scale.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
End User Tools. Target environment: Cluster and Grids (distributed sets of clusters) Grid Protocols Grid Resources at UW Grid Storage Grid Middleware.
Using Globus to Scale an Application Case Study 4: Scientific Workflow for Computational Economics Tiberiu Stef-Praun, Gabriel Madeira, Ian Foster, Robert.
Christopher Jeffers August 2012
A Swift Talk about Globus Technology: What Can It Do for Me? OOI Cyberinfrastructure Design Meeting, San Diego, October The Globus Team (presented.
Workflow Systems for LQCD SciDAC LQCD Software meeting, Boston, Feb 2008 Fermilab, IIT, Vanderbilt.
A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster
LOGO Scheduling system for distributed MPD data processing Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna.
DISTRIBUTED COMPUTING
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
DOE BER Climate Modeling PI Meeting, Potomac, Maryland, May 12-14, 2014 Funding for this study was provided by the US Department of Energy, BER Program.
Ian Foster Computation Institute Argonne National Lab & University of Chicago From the Heroic to the Logistical Programming Model Implications of New Supercomputing.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Combining the strengths of UMIST and The Victoria University of Manchester Utility Driven Adaptive Workflow Execution Kevin Lee School of Computer Science,
Pwrake: An extensible parallel and distributed flexible workflow management tool Masahiro Tanaka and Osamu Tatebe University of Tsukuba PRAGMA March.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
Virtual Batch Queues A Service Oriented View of “The Fabric” Rich Baker Brookhaven National Laboratory April 4, 2002.
Algorithmic Finance and Tools for Grid Execution (the Swift Grid Scripting/Workflow tool) Tiberiu (Tibi) Stef-Praun.
DISTRIBUTED COMPUTING. Computing? Computing is usually defined as the activity of using and improving computer technology, computer hardware and software.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Intermediate Condor: Workflows Rob Quick Open Science Grid Indiana University.
LOGO Development of the distributed computing system for the MPD at the NICA collider, analytical estimations Mathematical Modeling and Computational Physics.
Operating Systems Objective n The historic background n What the OS means? n Characteristics and types of OS n General Concept of Computer System.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Toward interactive visualization in a distributed workflow Steven G. Parker Oscar Barney Ayla Khan Thiago Ize Steven G. Parker Oscar Barney Ayla Khan Thiago.
Experiences Running Seismic Hazard Workflows Scott Callaghan Southern California Earthquake Center University of Southern California SC13 Workflow BoF.
From the customer’s perspective the SRS is: How smart people are going to solve the problem that was stated in the System Spec. A “contract”, more or less.
1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part IV Workflow Mapping and Execution in Pegasus (Thanks.
Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute
Introduction to Makeflow and Work Queue Nicholas Hazekamp and Ben Tovar University of Notre Dame XSEDE 15.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Towards an Open Analytics Environment (A “Data Cauldron”)
OpenPBS – Distributed Workload Management System
Dag Toppe Larsen UiB/CERN CERN,
U.S. ATLAS Grid Production Experience
Dag Toppe Larsen UiB/CERN CERN,
Spark Presentation.
Parallel Programming By J. H. Wang May 2, 2017.
Parallel Algorithm Design
AWS Batch Overview A highly-efficient, dynamically-scaled, batch computing service May 2017.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Introduction to Makeflow and Work Queue
Haiyan Meng and Douglas Thain
Overview of Workflows: Why Use Them?
A General Approach to Real-time Workflow Monitoring
Overview Activities from additional UP disciplines are needed to bring a system into being Implementation Testing Deployment Configuration and change management.
Presentation transcript:

Case Study 2: Scientific Workflow for Computational Economics Tiberiu Stef-Praun Gabriel Madeira Ian Foster Robert Townsend

eSS 2007Service-Oriented Science: Globus Software in Action2 The Challenge l Expand capability of economists to develop and validate models of social interactions at large scales u Harness large computation systems u Simplify programming model (eye toward easy integration of science code) u Improve automation l Requires an end-to-end approach, but through integration, not the silo model

eSS 2007Service-Oriented Science: Globus Software in Action3 Moral Hazard Problem l An entity in control of some resources (the entrepreneur) contracts with other entities that use these resources to produce outputs (the workers) l Two organizational forms are available u The workers cooperate on their efforts and divide up their income (thus sharing risks) u The workers are independent of each other, and are rewarded based on relative performance l Both are stylized versions of what is observed in tenancy data in villages such as in Maharastra, India (Townsend and Mueller 1998)

eSS 2007Service-Oriented Science: Globus Software in Action4 Moral Hazard Solver l Five stages, each solved by linear programming u Balance between promises for future and consumption to optimally reward agents l In each stage: Given a set of parameters: consumption, effort, technology, output, wealth u Do a linear optimization to find out the best behavior u Parameter sweep (grid of parameter values) u Linear solver is run independently on each point of the parameter grid u Results are merged at end of the stage l Across stages: Different organization (parameters) for similar stage structure u Most stages depend on results of other stages

eSS 2007Service-Oriented Science: Globus Software in Action5 Stage One 26 x StageOne.${i}.out *.mat input data files Stage Five MergedStageOne.out MergedStageTwo.out MergedStageThree.out MergedStageFour.out MergedStageFive.out Stage Two 52 x StageTwo.${i}.out Stage Four 40 x StageOne.${i}.out Stage Three 40 x StageThree.${i}.out Remote Execution Local Execution Legend 50 Min 30 Min 3 Min 40 Min 2 Min

eSS 2007Service-Oriented Science: Globus Software in Action6 Issues - Technical l Language u Science code written in MATLAB/Octave u End to end system must be language-independent l Code prerequisites u Each solver task requires MATLAB/Octave pre- installed on the execution node, and solver code staged in prior to execution u Each solver task requires files from previous stages l Automation u ~200 tasks must be executed u This is a lot of babysitting if performed manually

eSS 2007Service-Oriented Science: Globus Software in Action7 Issues - Social l Licensing u MATLAB licensing has a per-node cost u Expensive if youre using O(10)+ nodes l Provenance u Task execution, data integrity u Not a huge concern at this scale, but for larger scales (10,000 tasks) it is important to record how the work is performed l Provisioning, resource sharing u This problem used a shared campus cluster (at U Chicago) u We know of problems with 2-3 orders of magnitude more tasks, which require (inter)national-scale resources to accomplish in a timely fashion

eSS 2007Service-Oriented Science: Globus Software in Action8 Swift System l Clean separation of logical/physical concerns u XDTM specification of logical data structures + Concise specification of parallel programs u SwiftScript, with iteration, etc. + Efficient execution on distributed resources u Karajan threading, Falkon provisioning, Globus interfaces, pipelining, load balancing + Rigorous provenance tracking and query u Virtual data schema & automated recording Improved usability and productivity u Demonstrated in numerous applications

eSS 2007Service-Oriented Science: Globus Software in Action9 Workflow Language - SwiftScript l Goal: Natural feel to expressing distributed applications u Variables (basic, data structures) u Conditional operators (if, foreach, ) u Functions (atomic / compound) l Used to connect outputs to inputs l It does not specify invocation order, only dependencies l It can be seen as a metadata for expressing experiments

eSS 2007Service-Oriented Science: Globus Software in Action10 Execution Engine l Karajan engine (event-based execution) l Has a scheduler to map tasks to resources u Score-based planning u Recovers from failures (retries) l Falkon resource manager creates a virtual private cluster u Uses Globus GRAM4 (PBS/Condor/Fork) to acquire resources from Grid systems

Virtual Node(s) SwiftScript Abstract computation Virtual Data Catalog SwiftScript Compiler SpecificationExecution Virtual Node(s) Provenance data Provenance data Provenance collector launcher file1 file2 file3 App F1 App F2 Scheduling Execution Engine (Karajan w/ Swift Runtime) Swift runtime callouts C CCC Status reporting Provisioning Falkon Resource Provisioner Amazon EC2 Dynamic Provisioning: Swift Architecture Yong Zhao, Mihael Hatigan, Ioan Raicu, Mike Wilde, Ben Clifford

eSS 2007Service-Oriented Science: Globus Software in Action12 The Solution l Code changes u Solver code was broken into modules (atomic blocks) to allow parallel execution u Code ported from MATLAB to Octave to avoid per- node licensing fees u Workflow was described in SwiftScript l Software installation u Swift engine, Karajan, Falkon deployed locally l Shared resource (already available) u Existing compute cluster with GRAM4, GridFTP, etc.

eSS 2007Service-Oriented Science: Globus Software in Action13 Moral Hazard SwiftScript Code Excerpts // A second atomic procedure: merge (file mergeSolutions[]) econMerge (file merging[]) { } } // We define the stage one procedure–a compound procedure (file solutions[]) stageOne (file inputData[], file prevResults[]) { file script ; int batch_size = 26; int batch_range = [0:25]; string inputName = "IRRELEVANT"; string outputName = "stageOneSolverOutput"; // The foreach statement specifies that the calls can be performed concurrently foreach i in batch_range { int position = i*batch_size; solutions[i] = moralhazard_solver(script,batch_size,position, inputName, outputName, inputData, prevResults); } } // These get used in the main program as follows stageOneSolutions = StageOne(stageOneInputFiles,stageOnePrevFiles); stageOneOutputs = econMerge(stageOneSolutions);

eSS 2007Service-Oriented Science: Globus Software in Action14 Execution on 40 Processors

eSS 2007Service-Oriented Science: Globus Software in Action15 Results - Moral Hazard Solver l Performance u Original run time: ~2 hrs u Swift run time: ~28 min u Depending on the stage structure, speedup up to 10x, or slowdown (because of overhead) u Only used one grid site (UC), on multiple sites could get better performance l Execution has been automated u Human labor greatly reduced u Separation of human concerns (science code, system operation, task management) u Easy to repeat, modify & rerun, etc.

eSS 2007Service-Oriented Science: Globus Software in Action16 Other Applications Application#Jobs/computationLevels ATLAS* HEP Event Simulation 500K1 fMRI DBIC* AIRSN Image Processing 100s12 FOAM Ocean/Atmosphere Model 2000 (core app runs CPU jobs) 3 GADU* Genomics: (14 million seq. analyzed) 40K4 HNL fMRI Aphasia Study 5004 NVO/NASA* Photorealistic Montage/Morphology 1000s16 QuarkNet/I2U2* Physics Science Education 10s3-6 RadCAD* Radiology Classifier Training 1000s5 SIDGrid EEG Wavelet Proc, Gaze Analysis, … 100s20 SDSS* Coadd, Cluster Search 40K, 500K2, 8