SC 2004, Pittsburgh, Nov GRID superscalar: a programming paradigm for GRID applications CEPBA-IBM Research Institute Rosa M. Badia, Jesús Labarta, Josep M. Pérez, Raül Sirvent
SC 2004, Pittsburgh, Nov Outline Objective The essence User’s interface Automatic code generation Run-time features Programming experiences Ongoing work Conclusions
SC 2004, Pittsburgh, Nov Objective Ease the programming of GRID applications Basic idea: Grid ns seconds/minutes/hours
SC 2004, Pittsburgh, Nov Outline Objective The essence User’s interface Automatic code generation Current run-time features Programming experiences Future work Conclusions
SC 2004, Pittsburgh, Nov The essence Assembly language for the GRID –Simple sequential programming, well defined operations and operands –C/C++, Perl, … Automatic run time “parallelization” –Use architectural concepts from microprocessor design Instruction window (DAG), Dependence analysis, scheduling, locality, renaming, forwarding, prediction, speculation,…
SC 2004, Pittsburgh, Nov Input/output files The essence for (int i = 0; i < MAXITER; i++) { newBWd = GenerateRandom(); subst (referenceCFG, newBWd, newCFG); dimemas (newCFG, traceFile, DimemasOUT); post (newBWd, DimemasOUT, FinalOUT); if(i % 3 == 0) Display(FinalOUT); } fd = GS_Open(FinalOUT, R); printf("Results file:\n"); present (fd); GS_Close(fd);
SC 2004, Pittsburgh, Nov The essence Subst DIMEMAS EXTRACT Subst DIMEMAS EXTRACT … GS_open Subst DIMEMAS EXTRACT Subst DIMEMAS EXTRACT Subst DIMEMAS EXTRACT Subst DIMEMAS EXTRACT Subst DIMEMAS EXTRACT Display CIRI Grid
SC 2004, Pittsburgh, Nov The essence Subst DIMEMAS EXTRACT Subst DIMEMAS EXTRACT … GS_open Subst DIMEMAS EXTRACT Subst DIMEMAS EXTRACT Subst DIMEMAS EXTRACT Subst DIMEMAS EXTRACT Subst DIMEMAS EXTRACT Display CIRI Grid
SC 2004, Pittsburgh, Nov Outline Objective The essence User’s interface Automatic code generation Run-time features Programming experiences Ongoing work Conclusions
SC 2004, Pittsburgh, Nov Three components: –Main program –Subroutines/functions –Interface Definition Language (IDL) file Programming languages: C/C++, Perl User’s interface
SC 2004, Pittsburgh, Nov A Typical sequential program –Main program: for (int i = 0; i < MAXITER; i++) { newBWd = GenerateRandom(); subst (referenceCFG, newBWd, newCFG); dimemas (newCFG, traceFile, DimemasOUT); post (newBWd, DimemasOUT, FinalOUT); if(i % 3 == 0) Display(FinalOUT); } fd = GS_Open(FinalOUT, R); printf("Results file:\n"); present (fd); GS_Close(fd); User’s interface
SC 2004, Pittsburgh, Nov User’s interface void dimemas(in File newCFG, in File traceFile, out File DimemasOUT) { char command[500]; putenv("DIMEMAS_HOME=/usr/local/cepba-tools"); sprintf(command, "/usr/local/cepba-tools/bin/Dimemas -o %s %s", DimemasOUT, newCFG ); GS_System(command); } A Typical sequential program –Subroutines/functions void display(in File toplot) { char command[500]; sprintf(command, "./display.sh %s", toplot); GS_System(command); }
SC 2004, Pittsburgh, Nov User’s interface GRID superscalar programming requirements –Main program: open/close files with GS_FOpen, GS_Open, GS_FClose, GS_Close –Currently required. Next versions will implement a version of C library functions with GRID superscalar semantic –Subroutines/functions Temporal files on local directory or ensure uniqueness of name per subroutine invocation GS_System instead of system All input/output files required must be passed as arguments
SC 2004, Pittsburgh, Nov interface MC { void subst(in File referenceCFG, in double newBW, out File newCFG); void dimemas(in File newCFG, in File traceFile, out File DimemasOUT); void post(in File newCFG, in File DimemasOUT, inout File FinalOUT); void display(in File toplot) }; Gridifying the sequential program –CORBA-IDL Like Interface: In/Out/InOut files Scalar values (in or out) –The subroutines/functions listed in this file will be executed in a remote server in the Grid. User’s interface
SC 2004, Pittsburgh, Nov Outline Objective The essence User’s interface Automatic code generation Run-time features Programming experiences Ongoing work Conclusions
SC 2004, Pittsburgh, Nov Automatic code generation app.idl app-worker.capp.capp-functions.c server gsstubgen app.h client app-stubs.c app_constraints.cc app_constraints_wrapper.cc app_constraints.h app.xml
SC 2004, Pittsburgh, Nov Sample stubs file #include … int gs_result; void Subst(file referenceCFG, double seed, file newCFG) { /* Marshalling/Demarshalling buffers */ char *buff_seed; /* Allocate buffers */ buff_seed = (char *)malloc(atoi(getenv("GS_GENLENGTH"))+1); /* Parameter marshalling */ sprintf(buff_seed, "%.20g", seed); Execute(SubstOp, 1, 1, 1, 0, referenceCFG, buff_seed, newCFG); /* Deallocate buffers */ free(buff_seed); } …
SC 2004, Pittsburgh, Nov Sample worker main file #include … int main(int argc, char **argv) { enum operationCode opCod = (enum operationCode)atoi(argv[2]); IniWorker(argc, argv); switch(opCod) { case SubstOp: { double seed; seed = strtod(argv[4], NULL); Subst(argv[3], seed, argv[5]); } break; … } EndWorker(gs_result, argc, argv); return 0; }
SC 2004, Pittsburgh, Nov Sample constraints skeleton file #include "mcarlo_constraints.h" #include "user_provided_functions.h" string Subst_constraints(file referenceCFG, double seed, file newCFG) { string constraints = ""; return constraints; } double Subst_cost(file referenceCFG, double seed, file newCFG) { return 1.0; } …
SC 2004, Pittsburgh, Nov Sample constraints wrapper file (1) #include … typedef ClassAd (*constraints_wrapper) (char **_parameters); typedef double (*cost_wrapper) (char **_parameters); // Prototypes ClassAd Subst_constraints_wrapper(char **_parameters); double Subst_cost_wrapper(char **_parameters); … // Function tables constraints_wrapper constraints_functions[4] = { Subst_constraints_wrapper, … }; cost_wrapper cost_functions[4] = { Subst_cost_wrapper, … };
SC 2004, Pittsburgh, Nov Sample constraints wrapper file (2) ClassAd Subst_constraints_wrapper(char **_parameters) { char **_argp; // Generic buffers char *buff_referenceCFG; char *buff_seed; // Real parameters char *referenceCFG; double seed; // Read parameters _argp = _parameters; buff_referenceCFG = *(_argp++); buff_seed = *(_argp++); //Datatype conversion referenceCFG = buff_referenceCFG; seed = strtod(buff_seed, NULL); string _constraints = Subst_constraints(referenceCFG, seed); ClassAd _ad; ClassAdParser _parser; _ad.Insert("Requirements", _parser.ParseExpression(_constraints)); // Free buffers return _ad; }
SC 2004, Pittsburgh, Nov Sample constraints wrapper file (3) double Subst_cost_wrapper(char **_parameters) { char **_argp; // Generic buffers char *buff_referenceCFG; char *buff_referenceCFG; char *buff_seed; // Real parameters char *referenceCFG; double seed; // Allocate buffers // Read parameters _argp = _parameters; buff_referenceCFG = *(_argp++); buff_seed = *(_argp++); //Datatype conversion referenceCFG = buff_referenceCFG; seed = strtod(buff_seed, NULL); double _cost = Subst_cost(referenceCFG, seed); // Free buffers return _cost; } …
SC 2004, Pittsburgh, Nov Binary building client GRID superscalar runtime server i app-functions.c app-worker.c app-stubs.c app.c GT server i app-functions.c app-worker.c GT2 services: gsiftp, gram app_constraints.cc app_constraints_wrapper.cc
SC 2004, Pittsburgh, Nov Calls sequence without GRID superscalar app.c LocalHost app-functions.c
SC 2004, Pittsburgh, Nov Calls sequence with GRID superscalar app.c app-stubs.c GRID superscalar runtime app_constraints_wrapper.cc app_constraints.cc GT2 LocalHost RemoteHost app-functions.c app-worker.c
SC 2004, Pittsburgh, Nov Outline Objective The essence User interface Automatic code generation Run-time features Programming experiences Ongoing work Conclusions
SC 2004, Pittsburgh, Nov Run-time features Previous prototype over Condor and MW Current prototype over Globus 2.x, using the API File transfer, security, … provided by Globus Run-time implemented primitives –GS_on, GS_off –Execute –GS_Open, GS_Close, GS_FClose, GS_FOpen –GS_Barrier –Worker side: GS_System
SC 2004, Pittsburgh, Nov Run-time features Data dependence analysis Renaming File forwarding Shared disks management and file transfer policy Resource brokering Task scheduling Task submission End of task notification Results collection Explicit task synchronization File management primitives Checkpointing at task level Deployer Exception handling Current prototype over Globus 2.x, using the API File transfer, security, … provided by Globus
SC 2004, Pittsburgh, Nov Data-dependence analysis Data dependence analysis –Detects RaW, WaR, WaW dependencies based on file parameters Oriented to simulations, FET solvers, bioinformatic applications –Main parameters are data files Tasks’ Directed Acyclic Graph is built based on these dependencies Subst DIMEMAS EXTRACT Subst DIMEMAS EXTRACT Subst DIMEMAS EXTRACT Display
SC 2004, Pittsburgh, Nov “f1_2” “f1_1” File-renaming WaW and WaR dependencies are avoidable with renaming T1_1 T2_1 T3_1 T1_2 T2_2 T3_2 T1_N … “f1” While (!end_condition()) { T1 (…,…, “f1”); T2 (“f1”, …, …); T3 (…,…,…); } WaR WaW
SC 2004, Pittsburgh, Nov File forwarding T1 T2 f1 T1 T2 f1 (by socket) File forwarding reduces the impact of RaW data dependencies
SC 2004, Pittsburgh, Nov File transfer policy client server 1 server 2 T1 f1f4 T6 f4 f7 f1 f7 Working directories
SC 2004, Pittsburgh, Nov Shared working directories client server 1 server 2 f1 f4 f7 f1 f7 T1 T6 Working directories
SC 2004, Pittsburgh, Nov Shared input disks client server 1 server 2 Input directories
SC 2004, Pittsburgh, Nov Disks configuration file khafre.cepba.upc.es SharedDisk0 /app/DB/input_data kandake0.cepba.upc.es SharedDisk0 /usr/DB/inputs kandake1.cepba.upc.es SharedDisk0 /usr/DB/inputs kandake0.cepba.upc.es DiskLocal0 /home/ac/rsirvent/matmul-perl/worker_perl kandake1.cepba.upc.es DiskLocal0 /home/ac/rsirvent/matmul-perl/worker_perl khafre.cepba.upc.es DiskLocal1 /home/ac/rsirvent/matmul_worker/worker working directories shared directories
SC 2004, Pittsburgh, Nov Resource Broker Resource brokering –Currently not a main project goal –Interface between run-time and broker –A Condor resource ClassAdd is built for each resource Broker configuration file: Machine LimitOfJobs Queue WorkingDirectory Arch OpSys GFlops Mem NCPUs SoftNameList khafre.cepba.upc.es 3 none /home/ac/rsirvent/DEMOS/mcarlo i386 Linux Perl560 Dimemas23 kadesh.cepba.upc.es 0 short /user1/uni/upc/ac/rsirvent/DEMOS/mcarlo powerpc AIX Perl560 Dimemas23 kandake.cepba.upc.es /home/ac/rsirvent/McarloClAds workers localhost
SC 2004, Pittsburgh, Nov Resource selection (1) Cost and constraints specified by user and per IDL task: Cost (time) of each task instance is estimated double Dimem_cost(file cfgFile, file traceFile) { double time; time = (GS_Filesize(traceFile)/ ) * f(GS_GFlops()); return(time); } A task ClassAdd is built on runtime for each task instance string Dimem_constraints(file cfgFile, file traceFile) { return "(member(\"Dimemas\", other.SoftNameList))"; }
SC 2004, Pittsburgh, Nov Resource selection (2) Broker receives requests from the run-time –ClassAdd library used to match resource ClassAdds with task ClassAdds –If more than one matching, selects the resource which minimizes: –FT = File transfer time to resource r –ET = Execution time of task t on resource r (using user provided cost function)
SC 2004, Pittsburgh, Nov Task scheduling Distributed between the Execute call, the callback function and the GS_Barrier call Possibilities –The task can be submitted immediately after being created –Task waiting for resource –Task waiting for data dependency GS_Barrier primitive before ending the program that waits for all tasks
SC 2004, Pittsburgh, Nov Task submission Task submitted for execution as soon as the data dependencies are solved if resources are available Composed of –File transfer –Task submission All specified in RSL Temporal directory created in the server working directory for each task Calls to globus: –globus_gram_client_job_request –globus_gram_client_callback_allow –globus_poll_blocking
SC 2004, Pittsburgh, Nov End of task notification Asynchronous state-change callbacks monitoring system –globus_gram_client_callback_allow() –callback_func function Data structures update in Execute function, GRID superscalar primitives and GS_Barrier
SC 2004, Pittsburgh, Nov Results collection Collection of output parameters which are not files –Partial barrier synchronization (task generation from main code cannot continue till we have this scalar result value) Socket and file mechanisms provided
SC 2004, Pittsburgh, Nov GS_Barrier Implicit task synchronization – GS_Barrier –Inserted in the user main program when required –Main program execution is blocked –globus_poll_blocking() called –Once all tasks are finished the program may resume
SC 2004, Pittsburgh, Nov File management primitives GRID superscalar file management API primitives: –GS_FOpen –GS_FClose –GS_Open –GS_Close Mandatory for file management operations in main program Opening a file with write option –Data dependence analysis –Renaming is applied Opening a file with read option –Partial barrier until the task that is generating that file as output file finishes Internally file management functions are handled as local tasks –Task node inserted –Data-dependence analysis –Function locally executed Future work: offer a C library with GS semantic (source code with typicals calls could be used)
SC 2004, Pittsburgh, Nov Task level checkpointing Inter-task checkpointing Recovers sequential consistency in the out-of-order execution of tasks Completed Running Committed Successful execution
SC 2004, Pittsburgh, Nov Task level checkpointing Inter-task checkpointing Recovers sequential consistency in the out-of-order execution of tasks Completed Running Committed Failing execution Failing Cancel Finished correctly
SC 2004, Pittsburgh, Nov Task level checkpointing Inter-task checkpointing Recovers sequential consistency in the out-of-order execution of tasks Completed Running Committed Restart execution Failing Finished correctly Execution continues normally!
SC 2004, Pittsburgh, Nov Checkpointing On fail: from N versions of a file to one version (last committed version) Transparent to application developer
SC 2004, Pittsburgh, Nov Deployer Java based GUI Allows workers specification: host details, libraries location… Selection of Grid configuration Grid configuration checking process: –Aliveness of host (ping) –Globus service is checked by submitting a simple test –Sends a remote job that copies the code needed in the worker, and compiles it Automatic deployment –sends and compiles code in the remote workers and the master Configuration files generation
SC 2004, Pittsburgh, Nov Deployer (2) Automatic deployment
SC 2004, Pittsburgh, Nov Exception handling GS_Speculative_End(func) / GS_Throw while (j<MAX_ITERS){ getRanges(Lini, BWini, &Lmin, &Lmax, &BWmin, &BWmax); for (i=0; i<ITERS; i++){ L[i] = gen_rand(Lmin, Lmax); BW[i] = gen_rand(BWmin, BWmax); Filter("nsend.cfg", L[i], BW[i], "tmp.cfg"); Dimemas("tmp.cfg", "nsend_rec_nosm.trf", Elapsed_goal, "dim_ou.txt"); Extract("tmp.cfg", "dim_out.txt", "final_result.txt"); } getNewIniRange("final_result.txt",&Lini, &BWini); j++; } GS_Speculative_End(my_func); void Dimemas(char * cfgFile, char * traceFile, double goal, char * DimemasOUT) { … putenv("DIMEMAS_HOME=/aplic/DIMEMAS"); sprintf(aux, "/aplic/DIMEMAS/bin/Dimemas -o %s %s", DimemasOUT, cfgFile); gs_result = GS_System(aux); distance_to_goal = distance(get_time(DimemasOUT), goal); if (distance_to_goal < goal*0.1) { printf("Goal Reached!!! Throwing exception.\n"); GS_Throw; } } Function executed when a exception is thrown
SC 2004, Pittsburgh, Nov Exception handling (2) Any worker can call to GS_Throw at any moment Task that rises the GS_Throw is the last valid task (all sequential tasks after that must be undone) The speculative part is considered from the task that throws the exception till the GS_Speculative_End (no need of a Begin clause) Possibly of calling a local function when the exception is detected.
SC 2004, Pittsburgh, Nov Putting all together: involved files User provided files Files generated from IDL Files generated by deployer app.c app-stubs.c app_constraints_wrapper.ccapp_constraints.cc app-functions.c app-worker.capp.h app_constraints.h broker.cfg diskmaps.cfg app.idl
SC 2004, Pittsburgh, Nov Outline Objective The essence User’s interface Automatic code generation Run-time features Programming experiences Ongoing work Conclusions
SC 2004, Pittsburgh, Nov Programming experiences Performance modelling (Dimemas, Paramedir) –Algorithm flexibility NAS Grid Benchmarks –Improved component programs flexibility –Reduced Grid level source code lines Bioinformatics application (production) –Improved portability (Globus vs just LoadLeveler) –Reduced Grid level source code lines Pblade solution for bioinformatics
SC 2004, Pittsburgh, Nov Programming experiences fastDNAml –Computes the likelihood of various phylogenetic trees, starting with aligned DNA sequences from a number of species (Indiana University code) –Sequential and MPI (grid-enabled) versions available –Ported to GRID superscalar Lower pressure on communications than MPI Simpler code than MPI … Tree evaluation Barrier
SC 2004, Pittsburgh, Nov NAS Grid Benchmarks
SC 2004, Pittsburgh, Nov NAS Grid Benchmarks All of them implemented with GRID superscalar Run with classes S, W, A Results scale as expected When several servers are used, ASCII mode required
SC 2004, Pittsburgh, Nov Programming experiences Performance analysis –GRID superscalar run-time instrumented –Paraver tracefiles from the client side –Measures of task execution time in the servers
SC 2004, Pittsburgh, Nov Programming experiences Overhead of GRAM Job Manager polling interval
SC 2004, Pittsburgh, Nov Programming experiences VP.S task assignment BT MF MG MF FT Kadesh KhafreRemote file transfers
SC 2004, Pittsburgh, Nov Outline Objective The essence User’s interface Automatic code generation Run-time features Programming experiences Ongoing work Conclusions
SC 2004, Pittsburgh, Nov Ongoing work OGSA oriented resource broker, based on Globus Toolkit 3.x. Bindings to Ninf-G2 Binding to ssh/rsh/scp New language bindings (shell script) And more future work: –Bindings to other basic middlewares GAT, … –Enhancements in the run-time performance guided by the performance analysis
SC 2004, Pittsburgh, Nov Conclusions Presentation of the ideas of GRID superscalar Exists a viable way to ease the programming of Grid applications GRID superscalar run-time enables –Use of the resources in the Grid –Exploiting the existent parallelism
SC 2004, Pittsburgh, Nov More information GRID superscalar home page: Rosa M. Badia, Jesús Labarta, Raül Sirvent, Josep M. Pérez, José M. Cela, Rogeli Grima, “Programming Grid Applications with GRID Superscalar”, Journal of Grid Computing, Volume 1 (Number 2): (2003).