Presentation is loading. Please wait.

Presentation is loading. Please wait.

From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)

Similar presentations


Presentation on theme: "From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)"— Presentation transcript:

1 From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL) http://grail.sdsc.edu/ San Diego Supercomputer Center (SDSC) Computer Science and Engineering Dept. (CSE) University of California, San Diego (UCSD)

2 Parameter Sweep Applications Many compute tasks No or simple dependencies Several output post-processing stages Potentially large datasets Input data Raw Output Tasks Post-processing Final Output

3 Relevance Arise in virtually every field of science an engineering Monte Carlo, Parameter Space Searches, Parameter Studies, etc. Biology, Astrophysics, Physics, Bioinformatics, Economics, etc. Primary candidate for Grid computing Latency-tolerant, amenable to simple fault- tolerance Need huge amount of resources

4 Outline of the Presentation Parameter Sweep Applications (PSAs) APST The Virtual Instrument BIO@Home

5 Scheduling of PSAs ?

6 Grid Scheduling Practice Ad-hoc solutions: specific to one application hand-tuned to the environment (e.g. SF-Express demo) Large body of work on Scheduling What can we re-use on the Grid?  Heterogeneous resources  Dynamic performance characteristics  Resources downtimes  Complex network topologies  Performance prediction errors

7 “DataGrid” Scheduling Goal: Co-locate/replicate data and computation Dynamic Priority List-Scheduling Built on heuristics described in [Ibarra77, Siegel99] Added adaptivity Simulation results List-scheduling works, adaptivity should make it practical Experimental results (Demo at SC’00 and SC’01) [HCW’00] H. Casanova, A. Legrand, et al.

8 Lessons Much scheduling work to re-use List-scheduling with Dynamic Priorities seems effective Simulation Experimental Let’s build software that uses it Let’s target scientific communities

9 Motivation for APST Started as scheduling research Evolved into a tool that provides Transparency of Grid execution Data movements Remote job management Multiple Grid middleware back-ends Scheduling Self-scheduling List scheduling w/ dynamic priorities

10 APST Designs The AppLeS Parameter Sweep Template: An Application Execution Environment XML application and resource descriptions APST client Grid Grid Services Scheduler TransportCompute Decisions Actions Metadata Bookkeeper Information APST

11 APST: Lessons The Grid is difficult to use APST provides a simple software layer that does one thing well Minimal user interface (XML, command-line) Used as a building block for domain-specific applications E.g. multi-cluster bio-informatics (Singapore) Ssh? Default mechanism Critical for gaining user buy in Natural way to lead to using the Grid

12 APST Status Version 1.1 released 2 weeks ago Available for public download Used for 10+ applications Bioinformatics (BLAST, HMM, …) Computational Neuro-science Globus, NetSolve, Ssh, Condor GASS, IBP, Scp, GridFTP, SRB, NWS, MDS, Ganglia,… http://grail.sdsc.edu/projects/apst

13 APST Research Directions APST is a research platform Maintained by one staff Several graduate student contributors Partitionable Workload Bioinformatics (database splitting) Factoring: Decrease chunk size Pipelining: Increase chunk size Combined? Create APST-BLAST (Mario Lauria, OSU Yang Yang, UCSD)

14 Outline of the Presentation Parameter Sweep Applications (PSAs) APST Virtual Instrument BIO@home

15 Computational Neuroscience MCell: Monte Carlo Cell simulator  Developed at Salk and PSC  Gain knowledge about neuro-transmission mechanisms Fundamental for drug design (psychiatry) Large user base (yearly MCell workshop) Parallel MC simulations at the molecular level

16 Traditional MCell usage “By hand” No automatic project management No transparent resource access No automated data management Consequences No interactive simulations No fault-tolerance, scheduling, …  MCell limited to resources in the lab

17 MCell and APST APST alleviates some of the limitations Large-scale simulations Fault-tolerance and scheduling Data retrieval from distributed storage XML application descriptions No interactivity MCell is exploratory User interaction is fundamental for many users

18 The Virtual Instrument $2.5M funding from the NSF Salk, PSC, UCSB, UTK, UCSD A running MCell simulation should behave as a lab instrument Computational steering for MCell User interface Grid software Application software Scheduling research (how does one scheduling an application that’s being steered interactively?)

19 VI Database VI Interface VI Daemon VI User Grid Storage and Compute Resources storage compute Grid Services control data control + data control + data process VI Software OpenDX

20 Scheduling Goals Reduce the “search” time Let user assign levels of importance to regions on the parameter space Assign fraction of resources with respect to the importance levels  Assign priorities to tasks Interesting questions Job control limited on Grid resource Cannot assign exact fractions Interesting trade-offs between control overhead and accuracy of priorities

21 Current Status First software prototype released in Feb 2002 Globus and Ssh MySQL OpenDX priority-based scheduling 20,000 lines of C++ Upcoming papers JPDC submission Scheduling paper (SC submission)

22 Outline of the Presentation Parameter Sweep Applications (PSAs) PSAs on the Grid with APST MCell Virtual Instrument Global Computing

23 SETI@home Over 500,000 active participants, most of which run screensaver on home PC Over a cumulative 20 TeraFlop/sec Versus 12.3 TeraFlop/sec of IBM’s ASCI White Cost: $500,000 + $200,000 in donated hardware Less than 1% of the $110 million required for ASCI White

24 Global vs. Grid Computing Nature of resources Home desktops running Windows and are completely autonomous Machines powered on and off by user Behind firewalls, dynamic IP, transient network connections Programming model Server cannot “push” tasks to clients Server has no little means for remote job control Server has incomplete information about resources and availability

25 Goal SETI@home limitations: Embarrassingly parallel Infinite amount of input data Pure throughput Can we do something more? Short-lived applications? Parallel applications? Compute service? BIO@Home Smith-Waterman for short/long sequences No real software yet (build on XtremWeb?)

26 Scheduling? Sophisticated scheduling algorithms need information and control  At the moment: Simple mechanisms 1.Work unit duplication Specifies max number of times a work unit can be resent 2.Timeouts Time that must elapse before work unit is resent

27 Simulation Built a simulation model Using statistics/surveys/extrapolations Next: logs from real systems (XtremWeb?, Entropia?) Evaluated the impact of both mechanisms on performance and throughput

28 Early Lessons Trade-off between throughput and turn- around time Duplication: aggressively decreases turn-around time wastes resources there is an optimal value Timeouts: moderately lowers turnaround times preserves good throughput infinite timeouts is of course not a good idea

29 Future work Two knobs Question: A compute service? Mix of applications (SETI, short-lived, …) Singapore Bio-informatics institute Notion of fairness? How do we implement policy with many volatile resources? Software Re-use existing platforms: XtremWeb Entropia

30 Conclusion APST, Virtual Instrument, BIO@Home Other GRAIL activities I didn’t talk about Scientific Computing Simulation Adaptive Scheduling Networking http://grail.sdsc.edu

31

32

33 Experimental Results UTK UCSD TITECH Tokyo  Self-scheduling  XSufferage


Download ppt "From Grid to Global Computing: Deploying Parameter Sweep Applications Henri Casanova Grid Research And Innovation Laboratory (GRAIL)"

Similar presentations


Ads by Google