Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cyber-Research: Meeting the Challenge of a Terascale Computing Infrastructure Francine Berman Department of Computer Science and Engineering, U. C. San.

Similar presentations


Presentation on theme: "Cyber-Research: Meeting the Challenge of a Terascale Computing Infrastructure Francine Berman Department of Computer Science and Engineering, U. C. San."— Presentation transcript:

1 Cyber-Research: Meeting the Challenge of a Terascale Computing Infrastructure Francine Berman Department of Computer Science and Engineering, U. C. San Diego San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure

2 Computation People & Training Instrumentation (large and/or many small) Large Databases Digital Libraries Broadband Network Connectivity Partnership Courtesy of Ruzena Bajcsy, NSF Cyberinfrastructure: Facing the challenges of computation, collaboration and education in the new millennium

3 High performance computation High-speed networks create a virtual parallel “machine” Collaborative research, education, outreach, training Remote instruments linked to computers and data archives Large amounts of data, and their collection, analysis and visualization Cyber-Research: Synergistic, collaborative research leveraging the cyberinfrastructure’s potential Courtesy of Ruzena Bajcsy, NSF Computation People & Training Instrumentation (large and/or many small) Large Databases Digital Libraries Broadband Network Connectivity Partnership

4 Deep ImpactBroad ImpactIndividual Impact Value Added – SCALE and SYNERGY Cyberinfrastructure enables the scale and synergy necessary for fundamental advances in computational and computer science through collaborative research, development, infrastructure and education activities Cyber-research goals are to foster

5 Short History of “Cyber-Research” 1980’s: Grand Challenge Problems –Multidisciplinary, cutting edge computational science –Platform is supercomputer/MPP 1990’s: Cutting edge applications defined for each resource –Communication-intensive applications developed for Gigabit networks –Grid applications linked instruments and computation for large- scale results –Data management becomes an integral part of applications for many communities

6 Cyber-Research in the Terascale Age Cyber-research in the Terascale Age targets cutting- edge technologies –Terascale computers –Petascale data –Terabit networks Advances in network technology are eliminating the “tyranny of distance” –distributed environments become viable for an expanded class of applications How do we write and deploy applications in these environments?

7 Infrastructure for Cyber-Research Hardware Resources System-level Middleware User-level Middleware Applications Base-level Infrastructure Resources may be heterogeneous, shared by multiple users, exhibit different performance characteristics, governed in distinct administrative domains Base-level infrastructure provides illusion of a complex aggregate resource (e.g. Globus) System-level middleware better enables user to handle complexity of underlying system (e.g. SRB) User-level middleware enables user to achieve performance on dynamic aggregate resource platforms (think programming environments) Applications must be anticipatory, adaptive, responsive to achieve performance Promotes performance Reduces Complexity

8 Cyber-Research Applications At any level, the “machine” is heterogeneous and complex with dynamic performance variation Performance-efficient applications must be –Adaptive to a vast array of possible resource configurations and behavior –Anticipatory of load and resource availability –Responsive to dynamic events We have increasing experience deploying scientific applications in these environments –However, it is difficult to routinely achieve performance Hardware Resources System-level Middleware User-level Middleware Applications Base-level Infrastructure

9 OGI UTK UCSD File transfer time from SC’99 (Portland, Ore.) to OGI (Portland), UTK (Tenn.), UCSD (Ca.) courtesy of Alan Su The Need for Adaptivity Resource load in multi-user distributed environments varies dynamically Even homogeneous environments may be experienced as heterogeneous by the user Network Weather Service monitored and predicted data from SDSC resources courtesy of Rich Wolski

10 MCellAppLeS/NWS Adaptive Grid-enabled MCell A Tale of Cyber-Research The beginning: An initial collaboration between disciplinary scientists and computer scientists

11 A Tale of Cyber-Research: The Application MCell - a general simulator for cellular microphysiology –Uses Monte Carlo diffusion and chemical reaction algorithm in 3D to simulate complex biochemical interactions of molecules –Molecular environment represented as 3D space in which trajectories of ligands against cell membranes tracked Researchers: Tom Bartol, Terry Sejnowski [Salk], Joel Stiles [CMU/PSC], Miriam and Edwin Salpeter [Cornell] Code in development for over a decade –Large user community – over 20 sites Ultimate Goal: A complete molecular model of neuro-transmission at level of entire cell

12 AppLeS - Application-Level Scheduling project Researchers: Fran Berman [UCSD], Rich Wolski [U.Tenn], Henri Casanova [UCSD] and many students AppLeS is focused on real-world adaptive scheduling in dynamic cluster and Grid environments Schedule Deployment Resource Discovery Resource Selection Schedule Planning and Performance Modeling Decision Model accessible resources feasible resource sets evaluated schedules “best” schedule Grid Infrastructure NWS AppLeS + application = self-scheduling application Resources A Tale of Cyber-Research: The Software

13 Evolutionary Roadmap AppLeS Adaptively scheduled MCell APST Entropia/peer-to-peer Condor Blue Horizon/ MPP GrADS Virtual Instruments LoCI/IBP NWS Hardware Resources System-level Middleware User-level Middleware Applications Base-level Infrastructure

14 Evolutionary Roadmap AppLeS Adaptively scheduled MCell APST Entropia/peer-to-peer Condor Blue Horizon/ MPP GrADS Virtual Instruments LoCI NWS

15 Cluster User’s host and storage Network links Storage Scheduling MCell Computational tasks

16 Why Isn’t Scheduling Easy? Good scheduling consider the location of large shared files Computation and data transfer should minimize file transfer time Adaptive scheduling necessary to account for dynamic environment

17 Scheduling Model Scheduling goals: –Identify heuristics that minimize execution time (computation and data movement) in a multi-cluster environment –Identify heuristics that are performance-efficient, even if load predictions are not always accurate

18 1 2 1 2 1 2 Network links Hosts (Cluster 1) Hosts (Cluster 2) Time Resources Computation G Scheduling event Scheduling event Computation Scheduling Approach Contingency Scheduling: Allocation developed by dynamically generating a Gantt chart for scheduling unassigned tasks between scheduling events Basic skeleton –Compute the next scheduling event –Create a Gantt Chart G –For each computation and file transfer currently underway, compute estimate of its completion time and fill in the corresponding slots in G –Select a subset T of the tasks that have not started execution –Until each host has been assigned enough work, heuristically assign tasks to hosts, filling in slots in G –Implement schedule

19 1 2 1 2 1 2 Network links Hosts (Cluster 1) Hosts (Cluster 2) Time Resources Computation Scheduling event Scheduling event Computation MCell Adaptive Scheduling Free “Parameters” –Frequency of scheduling events –Accuracy of task completion time estimates –Subset T of unexecuted tasks –Scheduling heuristic used

20 Scheduling Heuristics Scheduling heuristics useful for parameter sweeps in distributed environments –Min-Min [task/resource that can complete the earliest is assigned first] –Max-Min [longest of task/earliest resource times assigned first] –Sufferage [task that would “suffer” most if given a poor schedule assigned first, as computed by max - second max completion times] –Extended Sufferage [minimal completion times computed for task on each cluster, sufferage heuristic applied to these] –Workqueue [randomly chosen task assigned first] Criteria for evaluation: –Which scheduling heuristics minimize execution time? –How sensitive are heuristics to inaccurate performance information? –How well do heuristics exploit the location of shared input files and the cost of data transmission?

21 How Do the Scheduling Heuristics Compare? Comparison heuristics performance when it is up to 40 times more expensive to send a shared file across the network than it is to compute a task Extended sufferage takes advantage of file sharing to achieve good application performance Max-min Workqueue XSufferage Sufferage Min-min Simulation of MCell run in a multi-cluster environment

22 How sensitive are the heuristics to inaccurate performance information? One single scheduling event Scheduling events every 250 sec Scheduling events every 500 sec Scheduling events every 125 sec simulations workqueue Gantt chart heuristics

23 WorkqueueGantt Chart heuristics Workqueue More error Larger Average Makespan “Regime” Scheduling workqueue Gantt chart heuristics Better performance More non-uniformity Fixed workqueue Workqueue Fixed Workqueue Fixed

24 CLIENT APST Middleware From MCell to Parameter Sweeps Mcell is a parameter sweep application Parameter Sweeps = class of applications that are structured as multiple instances of an “experiment” with distinct parameter sets In PS applications, independent experiments may share input files

25 Evolutionary Roadmap AppLeS Adaptively scheduled MCell APST Entropia/peer-to-peer Condor Blue Horizon/ MPP GrADS Virtual Instruments LoCI/IBP NWS Hardware Resources System-level Middleware User-level Middleware Applications Base-level Infrastructure

26 APST User-level Middleware AppLeS Parameter Sweep Template (APST) –Targets structurally similar class of applications (parameter sweeps) –Can be instantiated in a user-friendly timeframe –Provides good application performance –Can be used to target a wide spectrum of platforms adaptively Joint work with Henri Casanova, Dmitrii Zagorodnov, Arnaud Legrand APST paper was best paper finalist at SC’00

27 NetSolve Globus Blue Horizon NWS IBP Clusters transport APIexecution API metadata API scheduler API Underlying Resources APST Daemon GASS IBP NFS GRAM NetSolve Condor, Ninf, Legion,.. NWS Workqueue Gantt chart heuristic algorithms Workqueue++ MinMinMaxMinSufferageXSufferage APST Client Controller interacts Command-line client Metadata Bookkeeper Actuator Scheduler triggers transferexecute actuate APST Architecture query retrieve

28 Cool things about APST Scheduler can be used for structurally similar set of Parameter Sweep Applications, in addition to MCell INS2D, INS3D (NASA Fluid Dynamics applications) Tphot (SDSC, Proton Transport application) NeuralObjects (NSI, Neural Network simulations) CS simulation applications for our own research (Model validation) PFAM (JCSG, structural genomics application), etc. Actuator’s APIs are interchangeable and mixable (NetSolve+IBP) + (GRAM+GASS) + (GRAM+NFS) Scheduler allows for dynamic adaptation, anticipation No Grid software is required However lack of it (NWS, GASS, IBP) may lead to poorer performance APST has been released to NPACI, NASA IPG, and is available at http://gridlab.ucsd.edu/apst/html http://gridlab.ucsd.edu/apst/html

29 University of Tennessee, Knoxville University of California, San Diego Tokyo Institute of Technology NetSolve + NFS NetSolve + IBP NetSolve + IBP GRAM + GASS Does the location of shared input files matter? CS Experiments –We ran instances of APST/MCell across a wide-area distributed platform –We compared execution times for both workqueue (location-insensitive) and Gantt chart heuristics (location- sensitive).

30 Data Location Matters workqueue Gantt-chart heuristics Experimental Setting: MCell application with 1,200 tasks: 6 Monte-Carlo simulations Input files: 1, 1, 20, 20, 100, 100 MB Gantt chart heuristics location sensitive Workqueue scheduling heuristics location insensitive 4 scenarios: (a) all input files replicated everywhere (b) 100MB files in Japan + California + Tennessee (c) 100MB files in Japan + California (d) all input files are only in Japan

31 What Can’t We Do with APST? Only mechanism for steering computation is the user: –User stops the execution at a particular point to view results –System does not alert user when interesting results have been computed –Scheduling goal is minimizing total execution rather than providing real-time feedback to user –Scheduling algorithms not designed to provide good intermediate results, only final results Rudimentary user interface No support for visualization

32 Evolution Adaptively scheduled MCell APST Entropia/ peer-to-peer Condor Blue Horizon/ MPP GrADS Virtual Instruments LoCI/IBP AppLeS NWS

33 Evolution Adaptively scheduled MCell APST Entropia/ peer-to-peer Condor Blue Horizon/ MPP GrADS Virtual Instruments LoCI/IBP MCell serves as driving application for VI project MCell used as an example application for Internet Backplane Protocol APST adapts to IBP envts. NPACI Alpha Project: APST targeted to MPPs APST targeted to Condor: Master’s thesis in progress APST will be targeted to GrADS development and execution envt. APST will be targeted to Entropia platforms integrated computational steering and scheduling AppLeS NWS Expanded portability New functionality AppLeS/NWS adaptive approach influences design of VI

34 From APST to Virtual Instruments Interactive steering is critical –Large multi-dimensional parameter spaces necessitate user- directed search –User-directed search requires feedback APST does not accommodate steering –APST scheduling heuristics optimize execution time, not feedback frequency –No capability for visualization New approach required to develop middleware that combines steering and visualization with effective scheduling

35 Work-in-Progress: Virtual (Software) Instrument Project Virtual SW instruments provide “knobs” which allow user to change the computational goals during execution, in order to follow promising leads –Application perceived as “continuous” by SW, must be scheduled between non-deterministic steering events –Scheduler and steering mechanism must adapt to continuously changing goals SW and new disciplinary results focus of NSF ITR project Project Participants: Fran Berman (UCSD/SDSC) [PI] Henri Casanova (UCSD/SDSC) Mark Ellisman (UCSD/SDSC) Terry Sejnowski (Salk) Tom Bartol (Salk) Joel Stiles (CMU/PSC) Jack Dongarra (UTK) Rich Wolski (UCSB)

36 Virtual Instrument Project Status PROJECT GOALS –To investigate and develop the scheduling and steering heuristics and models necessary for performance for software Virtual Instruments –To have a robust, mature and working Virtual Instrument prototype at the end of the project –To enable the MCell user community to use Virtual Instrument technology COMPONENTS UNDER COORDINATED DEVELOPMENT –Dynamic event model Model in which steering events are generated by state of the system, user steering activities, user-provided criteria –Adaptive steering-sensitive scheduling algorithms –Data management and phased (CS and Neuro) experiment strategies –Virtual Instrument Prototype More sophisticated and user-friendly user interface Ultimately, running prototype of software, suitable for MCell users

37 Computation People & Training Instrumentation (large and/or many small) Large Databases Digital Libraries Broadband Network Connectivity Partnership Targets high- performance distributed environments Uses predictions of network load to help stage input and output data Synergistic research, collaboration, education, training Remote instruments (electron microscopes) linked to computers and data archives Targets large amounts of data requiring collection, analysis and visualization MCell/APST/VI Cyber-research

38 MCell/APST/Virtual Instrument projects represent a model for the large-scale research, development and collaborations fostered by the PACI program –It is nearly impossible to do work at this scale within a traditional academic department These large-scale, multidisciplinary collaborations are becoming increasingly critical to achieve progress in science and engineering. What Does This Have to Do with SDSC and NPACI?

39 Cyber-Research requires Scale and Synergy Large-scale cutting edge results require –Cutting edge HW –Usable SW –Human resources Knowledge Relationships Synergy –Education, communication, dissemination

40 A Cyber-Research Bibliography PROJECT HOME PAGES MCell http://www.mcell.cnl.salk.edu UCSD Grid Lab and AppLeS http://gridlab.ucsd.edu Virtual Instrument Project http://gridlab.ucsd.edu/vi_itr/index.html Network Weather Service http://nws.npaci.edu/NWS/ APST http://gridlab.ucsd.edu/apst/htmlhttp://gridlab.ucsd.edu/apst/html NPACI MCell Alpha Project http://www.npaci.edu/Alpha/Monte_Carlo/index.html PLATFORM HOME PAGES SDSC http://www.sdsc.edu/top.html NASA IPG http://www.ipg.nasa.gov/ GrADS http://hipersoft.cs.rice.edu/grads/ IBP http://icl.cs.utk.edu/ibp/http://icl.cs.utk.edu/ibp/ Entropia http://www.entropia.com


Download ppt "Cyber-Research: Meeting the Challenge of a Terascale Computing Infrastructure Francine Berman Department of Computer Science and Engineering, U. C. San."

Similar presentations


Ads by Google