Workshop on Grid Applications Programming, July 2004 GRID superscalar: a programming paradigm for GRID applications CEPBA-IBM Research Institute Raül Sirvent, Josep M. Pérez, Rosa M. Badia, Jesús Labarta
Workshop on Grid Applications Programming, July 2004 Outline Objective The essence User’s interface Automatic code generation Run-time features Programming experiences Ongoing work Conclusions
Workshop on Grid Applications Programming, July 2004 Objective Ease the programming of GRID applications Basic idea: Grid ns seconds/minutes/hours
Workshop on Grid Applications Programming, July 2004 Outline Objective The essence User’s interface Automatic code generation Current run-time features Programming experiences Future work Conclusions
Workshop on Grid Applications Programming, July 2004 The essence Assembly language for the GRID –Simple sequential programming, well defined operations and operands –C/C++, Perl, … Automatic run time “parallelization” –Use architectural concepts from microprocessor design Instruction window (DAG), dependence analysis, scheduling, locality, renaming, forwarding, prediction, speculation,…
Workshop on Grid Applications Programming, July 2004 The essence for (int i = 0; i < MAXITER; i++) { newBWd = GenerateRandom(); subst (referenceCFG, newBWd, newCFG); dimemas (newCFG, traceFile, DimemasOUT); post (newBWd, DimemasOUT, FinalOUT); if(i % 3 == 0) Display(FinalOUT); } fd = GS_Open(FinalOUT, R); printf("Results file:\n"); present (fd); GS_Close(fd);
Workshop on Grid Applications Programming, July 2004 The essence Subst DIMEMAS Post Subst DIMEMAS Post … GS_open Subst DIMEMAS Post Subst DIMEMAS Post Subst DIMEMAS Post Subst DIMEMAS Post Subst DIMEMAS Post Display CIRI Grid
Workshop on Grid Applications Programming, July 2004 The essence Subst DIMEMAS Post Subst DIMEMAS Post … GS_open Subst DIMEMAS Post Subst DIMEMAS Post Subst DIMEMAS Post Subst DIMEMAS Post Subst DIMEMAS Post Display CIRI Grid
Workshop on Grid Applications Programming, July 2004 Outline Objective The essence User’s interface Automatic code generation Run-time features Programming experiences Ongoing work Conclusions
Workshop on Grid Applications Programming, July 2004 Three components: –Main program –Subroutines/functions –Interface Definition Language (IDL) file Programming languages: C/C++, Perl User’s interface
Workshop on Grid Applications Programming, July 2004 A Typical sequential program –Main program: for (int i = 0; i < MAXITER; i++) { newBWd = GenerateRandom(); subst (referenceCFG, newBWd, newCFG); dimemas (newCFG, traceFile, DimemasOUT); post (newBWd, DimemasOUT, FinalOUT); if(i % 3 == 0) Display(FinalOUT); } fd = GS_Open(FinalOUT, R); printf("Results file:\n"); present (fd); GS_Close(fd); User’s interface
Workshop on Grid Applications Programming, July 2004 User’s interface void dimemas(in File newCFG, in File traceFile, out File DimemasOUT) { char command[200]; putenv("DIMEMAS_HOME=/usr/local/cepba-tools"); sprintf(command, "/usr/local/cepba-tools/bin/Dimemas -o %s %s", DimemasOUT, newCFG ); GS_System(command); } A Typical sequential program –Subroutines/functions void display(in File toplot) { char command[500]; sprintf(command, "./display.sh %s", toplot); GS_System(command); }
Workshop on Grid Applications Programming, July 2004 User’s interface GRID superscalar programming requirements –Main program: open/close files with GS_FOpen, GS_Open, GS_FClose, GS_Close –Subroutines/functions Temporal files on local directory or ensure uniqueness of name per subroutine invocation GS_System instead of system All input/output files required must be passed as arguments
Workshop on Grid Applications Programming, July 2004 interface MC { void subst(in File referenceCFG, in double newBW, out File newCFG); void dimemas(in File newCFG, in File traceFile, out File DimemasOUT); void post(in File newCFG, in File DimemasOUT, inout File FinalOUT); void display(in File toplot) }; Gridifying the sequential program –CORBA-IDL Like Interface: In/Out/InOut files Scalar values (in or out) –The subroutines/functions listed in this file will be executed in a remote server in the Grid. User’s interface
Workshop on Grid Applications Programming, July 2004 Outline Objective The essence User’s interface Automatic code generation Run-time features Programming experiences Ongoing work Conclusions
Workshop on Grid Applications Programming, July 2004 Automatic code generation: C app.idl app-worker.c app.capp-functions.c server gsstubgen app.h client app-stubs.c
Workshop on Grid Applications Programming, July 2004 Outline Objective The essence User interface Automatic code generation Run-time features Programming experiences Ongoing work Conclusions
Workshop on Grid Applications Programming, July 2004 Run-time features Data dependence analysis –Detects RaW, WaR, WaW dependencies based on file parameters –Tasks’ Directed Acyclic Graph is built based on these dependencies File renaming –WaW and WaR dependencies are avoidable with renaming Shared disks management –Supports shared working directories: NFS –Allows shared input directories: mirrors of large DBs
Workshop on Grid Applications Programming, July 2004 Run-time features Resource brokering and task scheduling –Scheduling policy exploits file locality –File transfer time vs execution time tradeoff considered –Tasks submitted for execution as soon as the data dependencies are solved if resources are available –End of tasks is detected by means of asynchronous callbacks –Calls to globus: globus_gram_client_job_request globus_gram_client_job_status globus_gram_client_job_cancel globus_gram_client_callback_allow globus_poll_blocking
Workshop on Grid Applications Programming, July 2004 Run-time features Communication between workers and master –Socket and file mechanisms provided Checkpointing at task level –Inter-task checkpointing –Transparent to application developer All based in Globus Toolkit C APIs (version 2.x) –Provides authentication and authorization –File transfers through gsiftp service –Task handling with gram service
Workshop on Grid Applications Programming, July 2004 Outline Objective The essence User’s interface Automatic code generation Run-time features Programming experiences Ongoing work Conclusions
Workshop on Grid Applications Programming, July 2004 Programming experiences Parameter studies (Dimemas, Paramedir) –Algorithm flexibility NAS Grid Benchmarks –Improved component programs flexibility –Reduced Grid level source code lines Bioinformatics application (production) –Improved portability (Globus vs just LoadLeveler) –Reduced Grid level source code lines Pblade solution for bioinformatics
Workshop on Grid Applications Programming, July 2004 Outline Objective The essence User’s interface Automatic code generation Run-time features Programming experiences Ongoing work Conclusions
Workshop on Grid Applications Programming, July 2004 Ongoing work Automatic deployment
Workshop on Grid Applications Programming, July 2004 Ongoing work fastDNAml –Computes the likelihood of various phylogenetic trees, starting with aligned DNA sequences from a number of species (Indiana University code) –Sequential and MPI (grid-enabled) versions available –Porting to GRID superscalar Lower pressure on communications than MPI Simpler code than MPI
Workshop on Grid Applications Programming, July 2004 Ongoing work Run-time: exception handling try{ for (int n=0; n<=10; n++){ if (n>9) throw "Out of range"; myarray[n]='z'; } catch (char * str){ cout << "Exception: " << str << endl; } Interesting case: throw in workers, catch in main program
Workshop on Grid Applications Programming, July 2004 Ongoing work OGSA oriented resource broker, based on Globus Toolkit 3.x. And more future work: –Bindings to other basic middlewares GAT, Ninf-G2 –New language bindings (shell script) –Enhancements in the run-time performance guided by the performance analysis
Workshop on Grid Applications Programming, July 2004 Conclusions Presentation of the ideas of GRID superscalar Exists a viable way to ease the programming of Grid applications GRID superscalar run-time enables –Use of the resources in the Grid –Exploiting the existent parallelism
Workshop on Grid Applications Programming, July 2004 How GAT can help us Middleware in a higher level (skip Globus details) Avoid changing when Globus changes Abstraction for using other Grid Middlewares Resource Broker Intra-Task checkpointing mechanism Interesting GATObjects: –GATFile (GATFile_Copy, GATFile_Delete) –GATResourceDescription, GATResourceBroker, GATJob
Workshop on Grid Applications Programming, July 2004 More information GRID superscalar home page: Rosa M. Badia, Jesús Labarta, Raül Sirvent, Josep M. Pérez, José M. Cela, Rogeli Grima, “Programming Grid Applications with GRID Superscalar”, Journal of Grid Computing, Volume 1 (Number 2): (2003).