Adaptive Computing on the Grid – The AppLeS Project Francine Berman U.C. San Diego.

Slides:



Advertisements
Similar presentations
Nimrod/G GRID Resource Broker and Computational Economy
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
A Cloud Data Center Optimization Approach using Dynamic Data Interchanges Prof. Stephan Robert University of Applied Sciences.
Scheduling in Distributed Systems Gurmeet Singh CS 599 Lecture.
High Performance Computing Course Notes Grid Computing.
From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
The Network Weather Service A Distributed Resource Performance Forecasting Service for Metacomputing Rich Wolski, Neil T. Spring and Jim Hayes Presented.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Technical Architectures
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
ADAPT An Approach to Digital Archiving and Preservation Technology Principal Investigator: Joseph JaJa Lead Programmers: Mike Smorul and Mike McGann Graduate.
Achieving Application Performance on the Information Power Grid Francine Berman U. C. San Diego and NPACI This presentation will probably involve audience.
Performance Prediction Engineering Francine Berman U. C. San Diego Rich Wolski U. C. San Diego and University of Tennessee This presentation will probably.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
Quality of Service in IN-home digital networks Alina Albu 7 November 2003.
Achieving Application Performance on the Computational Grid Francine Berman U. C. San Diego This presentation will probably involve audience discussion,
CSE 160/Berman Programming Paradigms and Algorithms W+A 3.1, 3.2, p. 178, 6.3.2, H. Casanova, A. Legrand, Z. Zaogordnov, and F. Berman, "Heuristics.
AppLeS, NWS and the IPG Fran Berman UCSD and NPACI Rich Wolski UCSD, U. Tenn. and NPACI This presentation will probably involve audience discussion, which.
MCell Usage Scenario Project #7 CSE 260 UCSD Nadya Williams
Achieving Application Performance on the Computational Grid Francine Berman This presentation will probably involve audience discussion, which will create.
The AppLeS Project: Harvesting the Grid Francine Berman U. C. San Diego This presentation will probably involve audience discussion, which will create.
NPACI Alpha Project Review: Cellular Microphysiology on the Data Grid Fran Berman, UCSD Tom Bartol, Salk Institute.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
AppLeS / Network Weather Service IPG Pilot Project FY’98 Francine Berman U. C. San Diego and NPACI Rich Wolski U.C. San Diego, NPACI and U. of Tennessee.
Cal-(IT) 2 Francine Berman UCSD Interfaces and Software Layer Leader The Cal-IT2 Software Challenge.
New Development in the AppLeS Project or User-Level Middleware for the Grid Francine Berman University of California, San Diego.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
CSE 160/Berman Programming Paradigms and Algorithms W+A 3.1, 3.2, p. 178, 5.1, 5.3.3, Chapter 6, 9.2.8, , Kumar Berman, F., Wolski, R.,
STRATEGIES INVOLVED IN REMOTE COMPUTATION
Web Search Created by Ejaj Ahamed. What is web?  The World Wide Web began in 1989 at the CERN Particle Physics Lab in Switzerland. The Web did not gain.
Achieving Application Performance on the Grid: Experience with AppLeS Francine Berman U. C., San Diego This presentation will probably involve audience.
Nimrod/G GRID Resource Broker and Computational Economy David Abramson, Rajkumar Buyya, Jon Giddy School of Computer Science and Software Engineering Monash.
Slide 1 Experiences with NMI R2 Grids Software at Michigan Shawn McKee April 8, 2003 Internet2 Spring Meeting.
Parallel Tomography Shava Smallen CSE Dept. U.C. San Diego.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Development Timelines Ken Kennedy Andrew Chien Keith Cooper Ian Foster John Mellor-Curmmey Dan Reed.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
1 Logistical Computing and Internetworking: Middleware for the Use of Storage in Communication Micah Beck Jack Dongarra Terry Moore James Plank University.
Problem Solving with NetSolve Michelle Miller, Keith Moore,
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
Computer Science and Engineering Predicting Performance for Grid-Based P. 1 IPDPS’07 A Performance Prediction Framework.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Supporting Molecular Simulation-based Bio/Nano Research on Computational GRIDs Karpjoo Jeong Konkuk Suntae.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
Adaptive Computing on the Grid Using AppLeS Francine Berman, Richard Wolski, Henri Casanova, Walfredo Cirne, Holly Dail, Marcio Faerman, Silvia Figueira,
Economic and On Demand Brain Activity Analysis on Global Grids A case study.
Content caching and scheduling in wireless networks with elastic and inelastic traffic Group-VI 09CS CS CS30020 Performance Modelling in Computer.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyber-Research: Meeting the Challenge of a Terascale Computing Infrastructure Francine Berman Department of Computer Science and Engineering, U. C. San.
- GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,
Parallel Tomography Shava Smallen SC99. Shava Smallen SC99AppLeS/NWS-UCSD/UTK What are the Computational Challenges? l Quick turnaround time u Resource.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Achieving Application Performance on the Computational Grid Francine Berman U. C. San Diego and NPACI This presentation will probably involve audience.
Clouds , Grids and Clusters
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Adaptive Computing on the Grid – The AppLeS Project Francine Berman U.C. San Diego

Computing Today data archives networks visualization instruments MPPs clusters PCs Workstations Wireless

Berman, UCSD The Computational Grid Computational Grid is a collection of distributed, possibly heterogeneous resources which can be used as an ensemble to execute large-scale applications

Berman, UCSD Grid Computing What is it? –Running parallel and distributed programs on multiple resources by coordinating tasks and data –Running any program on whatever resources are available resources which execute the program best Why is Grid Computing important? Why is Grid Computing hard?

Berman, UCSD Why is Grid Computing Important? Internet/Grid increasingly serving as execution platform for large-scale computations –Web browsing = large-scale distributed search application = large-scale distributed data mining –Walmart uses network to support massive inventory control applications –Remote instruments, visualization facilities connected to computers for analysis in real-time through networks –Large distributed databases being developed for science and engineering applications (Digital Sky, weather prediction, Digital Libraries, etc.)

Berman, UCSD Why is Grid Computing Hard - I Difficult to achieve predictable program performance in dynamic, multi-user environments To achieve performance, programs must adapt to deliverable resource performance at execution time OGI UTK UCSD

Berman, UCSD Why is Grid Computing hard - II Lots of infrastructure needed: –Basic services (Grid middleware) Single login Authentication File transfer Multi-protocol communication –User environments (User-level middleware) Development environments and tools Application scheduling and deployment Performance monitoring, analysis, tuning

Berman, UCSD Grid Computing Lab Research Adaptive Grid Computing –The AppLeS Project User-level Middleware –APST New Directions: Megacomputing and other projects …

Berman, UCSD Adaptive Grid Computing with AppLeS Joint project with Rich Wolski (U. Tenn.) Goal: –To develop self-scheduling Grid programs which can adapt to deliverable Grid resource performance at execution time Approach: –Develop adaptive application schedulers which can predict program performance use these predictions to determine the most performance- efficient schedule deploy the “best” schedule on Grid resources within a reasonable timeframe

How Does AppLeS Work? App+Sched Deployment Resource Discovery Resource Selection Schedule Planning and Performance Modeling Decision Model accessible resources feasible resource sets evaluated schedules “best” schedule Grid Middleware NWS AppLeS + application = self-scheduling application Resources

Network Weather Service (Wolski, U. Tenn.) NWS –monitors current system state –provides best forecast of resource load from multiple models NWS can provide dynamic resource information for AppLeS NWS is stand-alone system Sensor Interface Reporting Interface Forecaster Model 2Model 3Model 1

An Example AppLeS: Simple SARA SARA = Synthetic Aperture Radar Atlas –application developed at JPL and SDSC Goal: Assemble/process files for user’s desired image –Radar organized into tracks –User selects track of interest and properties to be highlighted –Raw data is filtered and converted to an image format –Image displayed in web browser

Simple SARA AppLeS focuses on resource selection problem: Which site can deliver data the fastest? Code developed by Alan Su Data servers may store replicated files Network shared by variable number of users Compute server accesses target tracks from one or more data servers... Compute Servers Data Servers Client

Simple SARA Simple Performance Model Prediction of available bandwidth provided by Network Weather Service User’s goal is to optimize performance by minimizing file transfer time Common assumptions: (> = performs better) –vBNS > general internet –geographically close sites > geographically far sites –west coast sites > east coast sites

Experimental Setup Data for image accessed over shared networks Data sets megabytes, representative of SARA file sizes Servers used for experiments –lolland.cc.gatech.edu –sitar.cs.uiuc –perigee.chpc.utah.edu –mead2.uwashington.edu –spin.cacr.caltech.edu via vBNS via general internet

Experimental Results Experiment with larger data set (3 Mbytes) During this time-frame, farther sites provide data faster than closer site

9/21/98 Experiments Clinton Grand Jury webcast commenced at trial 25 At beginning of experiment, general internet provides data faster than vBNS

Supercomputing ’99 From Portland SC’99 floor during experimental timeframe, UCSD and UTK generally “closer” than Oregon Graduate Institute (OGI) in Portland AppLeS/NWS OGI UTK UCSD

Berman, UCSD AppLeS Applications We’ve developed many AppLeS applications –Simple SARA (Su) –Jacobi2D (Wolski) –PMHD3D (Dail, Obertelli) –MCell (Casanova) –INS2D (Zagorodnov, Casanova) –SOR (Schopf) –Tomography (Smallen, Frey, Cirne, Hayes) –Mandelbrot, Ray tracing (Shao) –Supercomputer AppLeS (Cirne) –…

Berman, UCSD User-level Middleware AppLeS applications are “point solutions” What if we want to develop schedulers for structurally similar classes of applications? AppLeS “templates” are user-level middleware designed to promote performance and ease-of programming for application classes Current GCL template activity: –APST (Casanova) –AMWAT (Shao, Hayes)

Berman, UCSD Example template: APST – AppLeS Parameter Sweep Template Common application structure used in various fields of science and engineering (Monte Carlo and other simulations, etc.) Joint work with Henri Casanova Large number of independent tasks First AppLeS Middleware package to be distributed to users Parameter Sweeps = class of applications which are structured as multiple instances of an “experiment” with distinct parameter sets

Example Parameter Sweep Application – MCell MCell = General simulator for cellular microphysiology Uses Monte Carlo diffusion and chemical reaction algorithm in 3D to simulate complex biochemical interactions of molecules Simulation = many “experiments” conducted on different parameter configurations –Experiments can be performed on separate machines Driving application for APST middleware

Berman, UCSD APST Programming Model Why isn’t scheduling easy? experiments

Berman, UCSD APST Programming Model Why isn’t scheduling easy? – Staging of large shared files may complicate the scheduling process – Post-processing must minimize file transfer time – Adaptive scheduling necessary to account for dynamic environment

Contingency Scheduling: Allocation developed by dynamically generating a Gantt chart for scheduling unassigned tasks between scheduling events Basic skeleton 1.Compute the next scheduling event 2.Create a Gantt Chart G 3.For each computation and file transfer currently underway, compute an estimate of its completion time and fill in the corresponding slots in G 4.Select a subset T of the tasks that have not started execution 5.Until each host has been assigned enough work, heuristically assign tasks to hosts, filling in slots in G 6.Implement schedule APST Scheduling Approach Network links Hosts (Cluster 1) Hosts (Cluster 2) Time Resources Computation G Scheduling event Scheduling event Computation

Free “Parameters” 1.Frequency of scheduling events 2.Accuracy of task completion time estimates 3.Subset T of unexecuted tasks 4.Scheduling heuristic used APST Scheduling Network links Hosts (Cluster 1) Hosts (Cluster 2) Time Resources Computation G Scheduling event Scheduling event Computation

Berman, UCSD APST Scheduling Heuristics Self-scheduling Algorithms workqueue workqueue w/ work stealing workqueue w/ work duplication... Gantt chart heuristics: MinMin, MaxMin Sufferage, XSufferage... Scheduling Algorithms for APST Applications Easy to implement and quick No need for performance predictions  Insensitive to data placement  More difficult to implement  Needs performance predictions Sensitive to data placement Simulation results (HCW ’00 paper) show that: Heuristics are worth it Xsufferage is good heuristic even when predictions are bad Complex environments require better planning (Gantt chart) Gantt Chart Algorithms Min-min Max-min Sufferage, XSufferage

Berman, UCSD NetSolve Globus Legion NWS Ninf IBP Condor APST Architecture transport APIexecution API metadata API scheduler API Grid Resources and Middleware APST Daemon GASSIBP NFS GRAMNetSolve Condor, Ninf, Legion,.. NWS Workqueue Gantt chart heuristic algorithms Workqueue++ MinMinMaxMinSufferageXSufferage APST Client Controller interacts Command-line client Metadata Bookkeeper Actuator Scheduler triggers transferexecutequery store actuate report retrieve

Berman, UCSD APST APST being used for –INS2D, INS3D (NASA Fluid Dynamics applications) –MCell (Salk, Biological Molecular Modeling application) –Tphot (SDSC, Proton Transport application) –NeuralObjects (NSI, Neural Network simulations) –CS simulation applications for our own research (Model validation) Actuator’s APIs are interchangeable and mixable –(NetSolve+IBP) + (GRAM+GASS) + (GRAM+NFS) Scheduler allows for dynamic adaptation, multithreading No Grid software is required –However lack of it (NWS, GASS, IBP) may lead to poorer performance Details in SC’00 paper Will be released in next 2 months to PACI, IPG users

Berman, UCSD How Do We Know the APST Scheduling Heuristics are Good? Experiments: –We ran large-sized instances of MCell across a distributed platform –We compared execution times for both workqueue and Gantt chart heuristics. University of Tennessee, Knoxville NetSolve + IBP University of California, San Diego GRAM + GASS Tokyo Institute of Technology NetSolve + NFS NetSolve + IBP APST Daemon APST Client

Berman, UCSD Results Experimental Setting: Mcell simulation with 1,200 tasks: composed of 6 Monte-Carlo simulations input files: 1, 1, 20, 20, 100, and 100 MB 4 scenarios: Initially (a) all input files are only in Japan (b) 100MB files replicated in California (c) in addition, one 100MB file replicated in Tennessee (d) all input files replicated everywhere workqueue Gantt-chart algs

Berman, UCSD New GCL Directions: Megacomputing (Internet Computing) Grid programs –Can reasonably obtain some information about environment (NWS predictions, MDS, HBM, …) –Can assume that login, authentication, monitoring, etc. available on target execution machines –Can assume that programs run to completion on execution platform Mega-programs –Cannot assume any information about target environment –Must be structured to treat target device as unfriendly host (cannot assume ambient services) –Must be structured for “throwaway” end devices –Must be structured to run continuously

Berman, UCSD Success with Megacomputing –Over 2 million users –Sustains over 22 teraflops in production use Entropia.com Can we run non- embarrassingly parallel codes successfully at this scale? –Computational Biology, Genomics …

Berman, UCSD Joint work with Derrick Kondo, Joy Xin, Matt DeVico Application template for peer-to-peer platforms First algorithm (Needleman- Wunsch Global Alignment) uses dynamic programming Plan is to use template with additional genomics applications Being developed for internet rather than Grid environment GTAAG A T A C C G Optimal alignments determined by traceback

Berman, UCSD Mega-programs Provide the algorithmic/application counterpart for very large scale platforms –peer-to-peer platforms, Entropia, etc. –Condor flocks –Large “free agent” environments –Globus –New platforms: networks of low-level devices, etc. Different computing paradigm than MPP, Grid Globus free agents … Entropia Condor DNA Alignment Algorithm 2 Algorithm 3

Berman, UCSD Thanks! –NSF, NPACI, NASA IPG, TITECH, UTK Coming soon to a computer near you: –Release of APST and AMWAT (AppLeS Master/ Worker Application Template) v0.1 by NPACI All-hands meeting (Feb ’01) –First prototype of 2001 –GCL software and papers: Grid Computing Lab: –Fran Berman –Henri Casanova –Walfredo Cirne –Holly Dail –Matt DeVico –Marcio Faerman –Jim Hayes –Derrick Kondo –Graziano Obertelli –Gary Shao –Otto Sievert –Shava Smallen –Alan Su –Atsuko Takefusa (visiting) –Renata Teixeira –Nadya Williams –Eric Wing –Qiao Xin

Parameter Sweep Heuristics Currently studying scheduling heuristics useful for parameter sweeps in Grid environments HCW 2000 paper compares several heuristics –Min-Min [task/resource that can complete the earliest is assigned first] –Max-Min [longest of task/earliest resource times assigned first] –Sufferage [task that would “suffer” most if given a poor schedule assigned first, as computed by max - second max completion times] –Extended Sufferage [minimal completion times computed for task on each cluster, sufferage heuristic applied to these] –Workqueue [randomly chosen task assigned first] Criteria for evaluation: –How sensitive are heuristics to location of shared input files and cost of data transmission? –How sensitive are heuristics to inaccurate performance information?

APST/MCell Simulation Results with “Quality of Information”