Adaptive Computing on the Grid Using AppLeS Francine Berman, Richard Wolski, Henri Casanova, Walfredo Cirne, Holly Dail, Marcio Faerman, Silvia Figueira, Jim Hayes, Graziano Obertelli, Jennifer Schopf, Gary Shao, Shava Smallen, Neil Spring, Alan Su, and Dmitrii Zagorodnov IEEE Transactions on Parallel and Distributed Systems, Vol. 14, No. 5, May 2003
Agenda Introduction Problems AppLeS and its components Result products Related works Discussions Conclusions
Introduction What is a Grid? –A collection of resources that can be used as an ensemble What are resources? –Computational devices, networks, online instruments, storage archives, and etc
Problems Heterogeneity –Different performance Inconsistentcy –Shared –Fail –Upgraded
AppLeS Project Application Level Scheduling Goals –Investigate adaptive scheduling for Grid computing –Apply research results to applications for validating the efficacy of the approach and extracting Grid performance for the end-user
Steps (6) Schedule Adaptation (1) Resource Discovery (2) Resource Selection (3) Schedule Generation (4) Schedule Selection (5) Application Execution
Resource Discovery Depend on the Grid –A List of user’s logins –Resource discovery services of each Grid
Resource Selection Simple SARA –Synthetic Aperture Radar Atlas –Developed by JPL and SDSC –Provide access to satellite images distributed in various repositories –End-to-end available bandwidth is predicted using NWS
Performance Modeling Jacobi 2D Main loop –Loop until convergence –For all matrix entries A i,j A i,j = ¼(A i,j + A i+1,j + A i-1,j + A i,j+1 + A i,j-1 ) –Compute local error Model –T i = Area i * Oper i * AvailCPU i + C i ; 1 <= I <= p i,ji-1,ji+1,j i,j-1 i,j+1 Area - the size of the strip, Oper - execution time to compute one entry AvailCPU - percentage of available CPU, C - Communication time
Scheduling Generation Complib –A computational biology application –Compare a library of unknown sequences against a database of “known” sequences using FASTA scoring method Parallization –Master/Worker –Work size Small unit size (Self-scheduling) - high overhead Big unit size - load imbalance
AppLeS’s Approch
Scheduling Adaptation MCell –A computational neuroscience application –Study biochemical interactions within living cells at molecular level –Multiple independent tasks –Shared input
XSufferage Based on Sufferage Sufferage value = second best - first best XSufferage concerns data replication time (zero for locally available)
Outcome APST - AppLeS Parameter Sweep Template AMWAT - AppLeS Master/Worker Application Template SA - Supercomputer AppLeS
APST Parameter Sweep Applications –Mostly independent Provide –Transparent deployment –Automatic scheduling Capabilities –Launching tasks –Moving and storing data –Discovering and monitoring resources
AMWAT Master/Worker Provide –APIs for Discovering Scheduling Predicting SS - Self-Scheduling FSC - Fixed Size Chunking GSS - Guided Self-Schduling TSS - Trapezoidal Self-Scheduling FAC2 - Factoring
SA Space-shared Moldable jobs Reduce response times
Related Works Environment –MARS and Dome - Run-time checkpointing environment Structure –MARS - SPMD –VDCE and SEA - Task graph –IOS - Real-time, fine-grained, task graph –Dome and SPP - Abstract language Dome - SPMD SPP - Task graph Performance model –Depend on program structure Objective –Minimize execution time
Related Works EnvStructPerfApproach AppLeSAny ProvidedAdaptive MARSChkPntSPMDStatisticsData Dist DomeChkPntSPMDData Dist VDCETGDerivedList Sched SPPTGDerived SEATGData FlowExpert Sys IOSTGDerivedGA GrADS
Discussions Performance of distributed applications depend on both application and platform- specific information Storage and service are usually separated Communication must be concerned in the model Multi-applications environment has not been addressed
Conclusions AppLeS –An application-level scheduling framework –Provide adaptive, flexible, and reusable components –being integrated into GrADS for building next generation Grid applications Each part has been demonstrated its improvement