Achieving Application Performance on the Computational Grid Francine Berman U. C. San Diego This presentation will probably involve audience discussion,

Slides:



Advertisements
Similar presentations
Nimrod/G GRID Resource Broker and Computational Economy
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Scheduling in Distributed Systems Gurmeet Singh CS 599 Lecture.
SDN + Storage.
Resource Management of Grid Computing
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Effective Coordination of Multiple Intelligent Agents for Command and Control The Robotics Institute Carnegie Mellon University PI: Katia Sycara
The Network Weather Service A Distributed Resource Performance Forecasting Service for Metacomputing Rich Wolski, Neil T. Spring and Jim Hayes Presented.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Achieving Application Performance on the Information Power Grid Francine Berman U. C. San Diego and NPACI This presentation will probably involve audience.
Performance Prediction Engineering Francine Berman U. C. San Diego Rich Wolski U. C. San Diego and University of Tennessee This presentation will probably.
Using Metacomputing Tools to Facilitate Large Scale Analyses of Biological Databases Vinay D. Shet CMSC 838 Presentation Authors: Allison Waugh, Glenn.
CSE 160/Berman Programming Paradigms and Algorithms W+A 3.1, 3.2, p. 178, 6.3.2, H. Casanova, A. Legrand, Z. Zaogordnov, and F. Berman, "Heuristics.
Adaptive Computing on the Grid – The AppLeS Project Francine Berman U.C. San Diego.
Computer Science Department 1 Load Balancing and Grid Computing David Finkel Computer Science Department Worcester Polytechnic Institute.
AppLeS, NWS and the IPG Fran Berman UCSD and NPACI Rich Wolski UCSD, U. Tenn. and NPACI This presentation will probably involve audience discussion, which.
MCell Usage Scenario Project #7 CSE 260 UCSD Nadya Williams
Achieving Application Performance on the Computational Grid Francine Berman This presentation will probably involve audience discussion, which will create.
The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing, Rich Wolski, Neil Spring, and Jim Hayes, Journal.
The AppLeS Project: Harvesting the Grid Francine Berman U. C. San Diego This presentation will probably involve audience discussion, which will create.
NPACI Alpha Project Review: Cellular Microphysiology on the Data Grid Fran Berman, UCSD Tom Bartol, Salk Institute.
AppLeS / Network Weather Service IPG Pilot Project FY’98 Francine Berman U. C. San Diego and NPACI Rich Wolski U.C. San Diego, NPACI and U. of Tennessee.
Workload Management Massimo Sgaravatto INFN Padova.
Chapter 9: Moving to Design
New Development in the AppLeS Project or User-Level Middleware for the Grid Francine Berman University of California, San Diego.
Introduction to Grid Computing Ann Chervenak Carl Kesselman And the members of the Globus Team.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
CSE 160/Berman Programming Paradigms and Algorithms W+A 3.1, 3.2, p. 178, 5.1, 5.3.3, Chapter 6, 9.2.8, , Kumar Berman, F., Wolski, R.,
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
1 Reading Report 9 Yin Chen 29 Mar 2004 Reference: Multivariate Resource Performance Forecasting in the Network Weather Service, Martin Swany and Rich.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
Achieving Application Performance on the Grid: Experience with AppLeS Francine Berman U. C., San Diego This presentation will probably involve audience.
Nimrod/G GRID Resource Broker and Computational Economy David Abramson, Rajkumar Buyya, Jon Giddy School of Computer Science and Software Engineering Monash.
Parallel Tomography Shava Smallen CSE Dept. U.C. San Diego.
DISTRIBUTED COMPUTING
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Development Timelines Ken Kennedy Andrew Chien Keith Cooper Ian Foster John Mellor-Curmmey Dan Reed.
1 Logistical Computing and Internetworking: Middleware for the Use of Storage in Communication Micah Beck Jack Dongarra Terry Moore James Plank University.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
Problem Solving with NetSolve Michelle Miller, Keith Moore,
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Perspectives on Grid Technology Ian Foster Argonne National Laboratory The University of Chicago.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
Tools for collaboration How to share your duck tales…
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
The Grid the united computing power Jian He Amit Karnik.
Authors: Ronnie Julio Cole David
9 Systems Analysis and Design in a Changing World, Fourth Edition.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Adaptive Computing on the Grid Using AppLeS Francine Berman, Richard Wolski, Henri Casanova, Walfredo Cirne, Holly Dail, Marcio Faerman, Silvia Figueira,
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
Cyber-Research: Meeting the Challenge of a Terascale Computing Infrastructure Francine Berman Department of Computer Science and Engineering, U. C. San.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Parallel Tomography Shava Smallen SC99. Shava Smallen SC99AppLeS/NWS-UCSD/UTK What are the Computational Challenges? l Quick turnaround time u Resource.
Use of Performance Prediction Techniques for Grid Management Junwei Cao University of Warwick April 2002.
CLIENT SERVER COMPUTING. We have 2 types of n/w architectures – client server and peer to peer. In P2P, each system has equal capabilities and responsibilities.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Achieving Application Performance on the Computational Grid Francine Berman U. C. San Diego and NPACI This presentation will probably involve audience.
Clouds , Grids and Clusters
Grid Computing.
ExaO: Software Defined Data Distribution for Exascale Sciences
Assoc. Prof. Dr. Syed Abdul-Rahman Al-Haddad
GATES: A Grid-Based Middleware for Processing Distributed Data Streams
The Anatomy and The Physiology of the Grid
The Anatomy and The Physiology of the Grid
Presentation transcript:

Achieving Application Performance on the Computational Grid Francine Berman U. C. San Diego This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during your presentation In Slide Show, click on the right mouse button Select “Meeting Minder” Select the “Action Items” tab Type in action items as they come up Click OK to dismiss this box This will automatically create an Action Item slide at the end of your presentation with your points entered.

Computing Today data archives networks visualization instruments MPPs clusters PCs Workstations Wireless

The Computational Grid –ensemble of heterogeneous, distributed resources –emerging platform for high-performance and resource-intensive computing Computational Grids you know and love: –Your computer lab –The internet –The PACI partnership resources –Your home computer, EECS and all the resources you have access to –You are already a Grid user …

Programming the Grid I Basics –Need way to login, authenticate in different domains, transfer files, coordinate execution, etc. Grid Infrastructure Resources Globus Legion Condor NetSolve PVM Application Development Environment

Programming the Grid II Performance-oriented programming –Need way to develop and execute performance- efficient programs –Program must achieve performance in an environment which is heterogeneous dynamic shared by other users with competing resource demands This can be extremely challenging. –Adaptive application scheduling is a fundamental technique for achieving performance

Why scheduling? –Experience with parallel and distributed codes shows that careful coordination of tasks and data required to achieve performance Why application scheduling? –No centralized scheduler which controls all Grid resources, applications are on their own –Resource and job schedulers prioritize utilization or throughput over application performance

Why adaptive application scheduling? –Heterogeneity of resources and dynamic load variations cause performance characteristics of platform to vary over time and with load –To achieve performance, application must adapt to deliverable resource capacities

Fundamental components: –Application-centric performance model Provides quantifiable measure of system components in terms of their potential impact on the application –Prediction of deliverable resource performance at execution time –User’s performance criteria Execution time Convergence Turnaround time These components form the basis for AppLeS. Adaptive Application Scheduling

What is AppLeS ? AppLeS = Application Level Scheduler –Joint project with Rich Wolski (U. of Tenn.) AppLeS is a methodology –Project has investigated adaptive application scheduling using dynamic information, application- specific performance models, user preferences. –AppLeS approach based on real-world scheduling. AppLeS is software –Have developed multiple AppLeS-enabled applications and templates which demonstrate the importance and usefulness of adaptive scheduling on the Grid.

How Does AppLeS Work? Grid Infrastructure NWS Schedule Deployment Resource Discovery Resource Selection Schedule Planning and Performance Model Decision Model accessible resources feasible resource sets evaluated schedules “best” schedule AppLeS + application = self-scheduling application Resources

Network Weather Service (Wolski, U. Tenn.) The NWS provides dynamic resource information for AppLeS NWS is stand-alone system NWS –monitors current system state –provides best forecast of resource load from multiple models Sensor Interface Reporting Interface Forecaster Model 2Model 3Model 1

AppLeS Example: Simple SARA SARA = Synthetic Aperture Radar Atlas –application developed at JPL and SDSC Goal: Assemble/process files for user’s desired image –Radar organized into tracks –User selects track of interest and properties to be highlighted –Raw data is filtered and converted to an image format –Image displayed in web browser

Simple SARA AppLeS focuses on resource selection problem: Which site can deliver data the fastest? Code developed by Alan Su Computation servers and data servers are logical entities, not necessarily different nodes Network shared by variable number of users Computation assumed to be performed at compute server... Compute Servers Data Servers Client

Simple SARA Simple Performance Model Prediction of available bandwidth provided by Network Weather Service User’s goal is to optimize performance by minimizing file transfer time Common assumptions: –vBNS > general internet –geographically close sites > geographically far sites –west coast sites > east coast sites

Experimental Setup Data for image accessed over shared networks Data sets megabytes, representative of SARA file sizes Servers used for experiments –lolland.cc.gatech.edu –sitar.cs.uiuc –perigee.chpc.utah.edu –mead2.uwashington.edu –spin.cacr.caltech.edu via vBNS via general internet

Preliminary Results Experiment with larger data set (3 Mbytes) During this time-frame, farther sites provide data faster than closer site

9/21/98 Experiments Clinton Grand Jury webcast commenced at trial 25 At beginning of experiment, general internet provides data faster than vBNS

Supercomputing ’99 From Portland SC’99 floor during experimental timeframe, UCSD and UTK generally “closer” than Oregon Graduate Institute (OGI) in Portland OGI UTK UCSD AppLeS/NWS

What if File Sizes are Larger? Storage Resource Broker (SRB) SRB provides access to distributed, heterogeneous storage systems –UNIX, HPSS, DB2, Oracle,.. –files can be 16MB or larger –resources accessed via a common SRB interface

Predicting Large File Transfer Times NWS and SRB present distinct behaviors NWS probe is 64K, SRB file size is 16MB Adaptive approach: Use adaptive linear regression on sliding window of NWS bandwidth measurements to track SRB behavior SRB Performance model being developed by Marcio Faerman

Problems with AppLeS AppLeS-enabled applications perform well in multi-user environments –Have developed/developing AppLeS for Stencil codes (Jacobi2D, magnetohydrodynamics, LU Decomposition …) Distributed data codes (SARA, SRB, …) Master/Slave codes (DOT, Ray Tracing, Mandelbrot, Tomography, …) Parameter Sweep codes (MCell, INS2D, CompLib, …) Methodology is right on target but … –AppLeS must be integrated with application –-- labor- intensive and time- intensive –You generally can’t just take an AppLeS and plug in a new application

AppLeS Templates Current thrust is to develop AppLeS templates which –target structurally similar classes of applications –can be instantiated in a user-friendly timeframe –provide good application performance AppLeS Template Application Module Performance Module Scheduling Module Deployment Module API Network Weather Service

Case Study: Parameter Sweep Template Parameter Sweeps = class of applications which are structured as multiple instances of an “experiment” with distinct parameter sets Independent experiments may share input files Examples: –MCell –INS2D Application Model

Example Parameter Sweep Application: MCell MCell = General simulator for cellular microphysiology Uses Monte Carlo diffusion and chemical reaction algorithm in 3D to simulate complex biochemical interactions of molecules –Molecular environment represented as 3D space in which trajectories of ligands against cell membranes tracked Researchers plan huge runs which will make it possible to model entire cells at molecular level. –100,000s of tasks –10s of Gbytes of output data –Would like to perform execution-time computational steering, data analysis and visualization

PST AppLeS Template being developed by Henri Casanova and Graziano Obertelli Resource Selection: –For small parameter sweeps, can dynamically select a performance efficient number of target processors [Gary Shao] –For large parameter sweeps, can assume that all resources may be used Platform Model Cluster User’s host and storage network links storage MPP

Contingency Scheduling: Allocation developed by dynamically generating a Gantt chart for scheduling unassigned tasks between scheduling events Basic skeleton 1.Compute the next scheduling event 2.Create a Gantt Chart G 3.For each computation and file transfer currently underway, compute an estimate of its completion time and fill in the corresponding slots in G 4.Select a subset T of the tasks that have not started execution 5.Until each host has been assigned enough work, heuristically assign tasks to hosts, filling in slots in G 6.Implement schedule Scheduling Parameter Sweeps Network links Hosts (Cluster 1) Hosts (Cluster 2) Time Resources Computation G Scheduling event Scheduling event Computation

Parameter Sweep Heuristics Currently studying scheduling heuristics useful for parameter sweeps in Grid environments HCW 2000 paper compares several heuristics –Min-Min [task/resource that can complete the earliest is assigned first] –Max-Min [longest of task/earliest resource times assigned first] –Sufferage [task that would “suffer” most if given a poor schedule assigned first, as computed by max - second max completion times] –Extended Sufferage [minimal completion times computed for task on each cluster, sufferage heuristic applied to these] –Workqueue [randomly chosen task assigned first] Criteria for evaluation: –How sensitive are heuristics to location of shared input files and cost of data transmission? –How sensitive are heuristics to inaccurate performance information?

Preliminary PST/MCell Results Comparison of the performance of scheduling heuristics when it is up to 40 times more expensive to send a shared file across the network than it is to compute a task “Extended sufferage” scheduling heuristic takes advantage of file sharing to achieve good application performance Max-min Workqueue XSufferage Sufferage Min-min

Preliminary PST/MCell Results with “Quality of Information”

Using Adaptation in Scheduling “Best” scheduling heuristic data dependent Scheduler can adapt heuristic to performance regime Ray Tracing [Gary Shao]

AppLeS in Context Grids –Application Scheduling Mars, Prophet/Gallop, MSHN, etc. –Programming Environments GrADS, Programmer’s Playground, VDCE, Nile, etc. –Scheduling Services Globus GRAM, Legion Scheduler/Collection/Enactor, PVM HeNCE –PSEs Nimrod, NEOS, NetSolve, Ninf Clusters PVM, NOW, HPVM, COW, etc. Scheduling –High-throughput and resource scheduling PBS, LSF, Maui Scheduler, Condor, etc. –Traditional Scheduling literature Performance Monitoring, Prediction and Steering Autopilot, SciRun, Network Weather Service, Remos/Remulac, Cumulus AppLeS project contributes careful and useful study of adaptive application scheduling for Grid and clustered environments

Work-in-Progress: Half-Baked AppLeS Quality of Information –Stochastic Scheduling –AppLePilot / GrADS Resource Economies –Bushel of AppLeS –UCSD Active Web Application Flexibility –Computational Steering –Co-allocation –Target-less computing

Quality of Information How can we deal with imperfect or imprecise predictive information? Quantitative measures of qualitative performance attributes can improve scheduling and execution –lifetime –cost –accuracy –penalty

Using Quality of Information Stochastic Scheduling: Information about the variability of the target resources can be used by scheduler to determine allocation –Resources with more performance variability assigned slightly less work –Preliminary experiments show that resulting schedule performs well and can be more predictable SOR Experiment [Jenny Schopf]

Quality of Information and “AppLePilot” AppLePilot combines AppLeS adaptive scheduling methodology with “fuzzy logic” decision making mechanism from Autopilot –Provides a framework in which to negotiate Grid services and promote application performance –Collaboration with Reed, Aydt, Wolski Builds on the software being developed for GrADS

GrADS – Grid Application Development and Execution Environment Prototype system which facilitates end-to-end “grid- aware” program development Based on the idea of a performance economy in which negotiated contracts bind application to resources Joint project with large team of researchers Ken Kennedy Jack Dongarra Dennis Gannon Dan Reed Lennart Johnsson PSEPSE Config. object program whole program compiler Source appli- cation libraries Realtime perf monitor Dynamic optimizer Grid runtime System (Globus) negotiation Software components Scheduler/ Service Negotiator Performance feedback Perf problem Grid Application Development System Andrew Chien Rich Wolski Ian Foster Carl Kesselman Fran Berman

Summary Development of AppLeS methodology, applications, templates, and models provides a careful investigation of adaptivity for emerging Grid environments Goal of current projects is to use real-world strategies to promote dynamic performance –adaptive scheduling –qualitative and quantitative modeling –multi-agent environments –resource economies

AppLeS Corps: –Fran Berman, UCSD –Rich Wolski, U. Tenn –Jeff Brown –Henri Casanova –Walfredo Cirne –Holly Dail –Marcio Faerman –Jim Hayes –Graziano Obertelli –Gary Shao –Otto Sievert –Shava Smallen –Alan Su Thanks to NSF, NASA, NPACI, DARPA, DoD AppLeS Home Page:

Using Dynamic Forecasting in Scheduling How much work should each processor be given? AppLeS performance model solves equations for Area: P1 P2P3

Good Predictions Promote Good Schedules Jacobi2D experiments

AppLeS Architecture Schedule Deployment Resource Discovery Resource Selection Schedule Planning Performance Model Decision Model accessible resources feasible resource sets schedules evaluated schedules “best” schedule

How Does AppLeS Work? Select resources For each feasible resource set, plan a schedule –For each schedule, predict application performance at execution time –consider both the prediction and its qualitative attributes Deploy the “best” of the schedules wrt user’s performance criteria –execution time –convergence –turnaround time AppLeS + application Resource selection Schedule Planning Deployment Grid Infrastructure NWS Resources