GRID superscalar: a programming model for the Grid Raül Sirvent Pardell Advisor: Rosa M. Badia Sala Doctoral Thesis Computer Architecture Department Technical.

Slides:



Advertisements
Similar presentations
National Institute of Advanced Industrial Science and Technology Ninf-G - Core GridRPC Infrastructure Software OGF19 Yoshio Tanaka (AIST) On behalf.
Advertisements

User-driven resource selection in GRID superscalar Last developments and future plans in the framework of CoreGRID Rosa M. Badia Grid and Clusters Manager.
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
M. Muztaba Fuad Masters in Computer Science Department of Computer Science Adelaide University Supervised By Dr. Michael J. Oudshoorn Associate Professor.
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
Distributed Computations
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Parallel Programming Models and Paradigms
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Workload Management Massimo Sgaravatto INFN Padova.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Distributed Computations MapReduce
SIMULATING ERRORS IN WEB SERVICES International Journal of Simulation: Systems, Sciences and Technology 2004 Nik Looker, Malcolm Munro and Jie Xu.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Massive Ray Tracing in Fusion Plasmas on EGEE J.L. Vázquez-Poletti, E. Huedo, R.S. Montero and I.M. Llorente Distributed Systems Architecture Group Universidad.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Programming the Cell Multiprocessor Işıl ÖZ. Outline Cell processor – Objectives – Design and architecture Programming the cell – Programming models CellSs.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
KARMA with ProActive Parallel Suite 12/01/2009 Air France, Sophia Antipolis Solutions and Services for Accelerating your Applications.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.
Grid Computing I CONDOR.
Through the development of advanced middleware, Grid computing has evolved to a mature technology in which scientists and researchers can leverage to gain.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Rio de Janeiro, October, 2005 SBAC Portable Checkpointing for BSP Applications on Grid Environments Raphael Y. de Camargo Fabio Kon Alfredo Goldman.
Jean-Sébastien Gay LIP ENS Lyon, Université Claude Bernard Lyon 1 INRIA Rhône-Alpes GRAAL Research Team Join work with DIET TEAM D istributed I nteractive.
MARISSA: MApReduce Implementation for Streaming Science Applications 作者 : Fadika, Z. ; Hartog, J. ; Govindaraju, M. ; Ramakrishnan, L. ; Gunter, D. ; Canon,
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
CellSs: A Programming Model for the Cell BE Architecture Pieter Bellens, Josep M. Perez, Rosa M. Badia, Jesus Labarta Barcelona Supercomputing Center (BSC-CNS)
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
ServiceSs, a new programming model for the Cloud Daniele Lezzi, Rosa M. Badia, Jorge Ejarque, Raul Sirvent, Enric Tejedor Grid Computing and Clusters Group.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Supporting Molecular Simulation-based Bio/Nano Research on Computational GRIDs Karpjoo Jeong Konkuk Suntae.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Workshop on Grid Applications Programming, July 2004 GRID superscalar: a programming paradigm for GRID applications CEPBA-IBM Research Institute Raül Sirvent,
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
HPC HPC-5 Systems Integration High Performance Computing 1 Application Resilience: Making Progress in Spite of Failure Nathan A. DeBardeleben and John.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Globus: A Report. Introduction What is Globus? Need for Globus. Goal of Globus Approach used by Globus: –Develop High level tools and basic technologies.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Sunpyo Hong, Hyesoon Kim
EGI Technical Forum Amsterdam, 16 September 2010 Sylvain Reynaud.
By Nitin Bahadur Gokul Nadathur Department of Computer Sciences University of Wisconsin-Madison Spring 2000.
Parallel Computing Presented by Justin Reschke
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
Tutorial on Science Gateways, Roma, Catania Science Gateway Framework Motivations, architecture, features Riccardo Rotondo.
1 An unattended, fault-tolerant approach for the execution of distributed applications Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT Madrid, Spain.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Daniele Lezzi Execution of scientific workflows on federated multi-cloud infrastructures IBERGrid Madrid, 20 September 2013.
Workload Management Workpackage
TensorFlow– A system for large-scale machine learning
OpenPBS – Distributed Workload Management System
Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016
湖南大学-信息科学与工程学院-计算机与科学系
DISTRIBUTED COMPUTING
Wide Area Workload Management Work Package DATAGRID project
Overview of Workflows: Why Use Them?
rvGAHP – Push-Based Job Submission Using Reverse SSH Connections
Quality-aware Middleware
Presentation transcript:

GRID superscalar: a programming model for the Grid Raül Sirvent Pardell Advisor: Rosa M. Badia Sala Doctoral Thesis Computer Architecture Department Technical University of Catalonia

GRID superscalar: a programming model for the Grid 2 Outline 1.Introduction 2.Programming interface 3.Runtime 4.Fault tolerance at the programming model level 5.Conclusions and future work

GRID superscalar: a programming model for the Grid 3 Outline 1.Introduction 1.1 Motivation 1.2 Related work 1.3 Thesis objectives and contributions 2.Programming interface 3.Runtime 4.Fault tolerance at the programming model level 5.Conclusions and future work

GRID superscalar: a programming model for the Grid Motivation  The Grid architecture layers Applications Grid Middleware (Job management, Data transfer, Security, Information, QoS,...) Distributed Resources

GRID superscalar: a programming model for the Grid Motivation  What middleware should I use?

GRID superscalar: a programming model for the Grid Motivation  Programming tools: are they easy? VS. Grid AWARE Grid UNAWARE GRID

GRID superscalar: a programming model for the Grid Motivation  Can I run my programs in parallel? VS. Explicit parallelism Implicit parallelism fork join for(i=0; i < MSIZE; i++) for(j=0; j < MSIZE; j++) for(k=0; k < MSIZE; k++) matmul(A(i,k), B(k,j), C(i,j)) … Draw it by hand means explicit

GRID superscalar: a programming model for the Grid Motivation  The Grid: a massive, dynamic and heterogeneous environment prone to failures –Study different techniques to detect and overcome failures  Checkpoint  Retries  Replication

GRID superscalar: a programming model for the Grid Related work System / Features Grid unaware Implicit parallelism Language TrianaNo Graphical SatinYesNoJava ProActivePartial Java PegasusYesPartialVDL SwiftYesPartialSwiftScript

GRID superscalar: a programming model for the Grid Thesis objectives and contributions  Objective: create a programming model for the Grid –Grid unaware –Implicit parallelism –Sequential programming –Allows to use well-known imperative languages –Speed up applications –Include fault detection and recovery

GRID superscalar: a programming model for the Grid Thesis objectives and contributions  Contribution: GRID superscalar –Programming interface –Runtime environment –Fault tolerance features

GRID superscalar: a programming model for the Grid 12 Outline 1.Introduction 2.Programming interface 2.1 Design 2.2 User interface 2.3 Programming comparison 3.Runtime 4.Fault tolerance at the programming model level 5.Conclusions and future work

GRID superscalar: a programming model for the Grid Design  Interface objectives –Grid unaware –Implicit parallelism –Sequential programming –Allows to use well-known imperative languages

GRID superscalar: a programming model for the Grid Design  Target applications –Algorithms which may be easily splitted in tasks Branch and bound computations, divide and conquer algorithms, recursive algorithms, … –Coarse grained tasks –Independent tasks Scientific workflows, optimization algorithms, parameter sweep –Main parameters: FILES External simulators, finite element solvers, BLAST, GAMESS

GRID superscalar: a programming model for the Grid Design  Application’s architecture: a master-worker paradigm –Master-worker parallel paradigm fits with our objectives –Main program: the master –Functions: workers Function = Generic representation of a task –Glue to transform a sequential application into a master- worker application: stubs – skeletons (RMI, RPC, …) Stub: call to runtime interface Skeleton: binary which calls to the user function

GRID superscalar: a programming model for the Grid Design app.c app-functions.c for(i=0; i < MSIZE; i++) for(j=0; j < MSIZE; j++) for(k=0; k < MSIZE; k++) matmul(A(i,k), B(k,j), C(i,j)) void matmul(char *f1, char *f2, char *f3) { getBlocks(f1, f2, f3, A, B, C); for (i = 0; i rows; i++) { for (j = 0; j cols; j++) { for (k = 0; k cols; k++) { C->data[i][j] += A->data[i][k] * B->data[k][j]; putBlocks(f1, f2, f3, A, B, C); } Local scenario

GRID superscalar: a programming model for the Grid Design Middleware Master-Worker paradigm app.c app-functions.c

GRID superscalar: a programming model for the Grid Design  Intermediate language concept: assembler code  In GRIDSs  The Execute generic interface –Instruction set is defined by the user –Single entry point to the runtime –Allows easy building of programming language bindings (Java, Perl, Shell Script) Easier technology adoption C, C++, …AssemblerProcessor execution C, C++, …WorkflowGrid execution

GRID superscalar: a programming model for the Grid User interface  Steps to program an application –Task definition Identify those functions/programs in the application that are going to be executed in the computational Grid All parameters must be passed in the header (remote execution) –Interface Definition Language (IDL) For every task defined, identify which parameters are input/output files and which are input/output scalars –Programming API: master and worker Write the main program and the tasks using GRIDSs API

GRID superscalar: a programming model for the Grid 20 interface MATMUL { void matmul(in File f1, in File f2, inout File f3); };  Interface Definition Language (IDL) file –CORBA-IDL like interface: in/out/inout files in/out/inout scalar values –The functions listed in this file will be executed in the Grid 2.2 User interface

GRID superscalar: a programming model for the Grid User interface  Programming API: master and worker  Master side GS_On GS_Off GS_FOpen/GS_FClose GS_Open/GS_Close GS_Barrier GS_Speculative_End app.capp-functions.c  Worker side GS_System gs_result GS_Throw

GRID superscalar: a programming model for the Grid User interface  Task’s constraints and cost specification –Constraints: allow to specify the needs of a task (CPU, memory, architecture, software, …) Build an expression in a constraint function (evaluated for every machine) –Cost: estimated execution time of a task (in seconds) Useful for scheduling Calculate it in a cost function GS_GFlops / GS_Filesize may be used An external estimator can be also called other.Mem == 1024 cost = operations / GS_GFlops();

GRID superscalar: a programming model for the Grid Programming comparison  Globus vs GRIDSs int main() { rsl = "&(executable=/home/user/sim)(arguments=input1.txt output1.txt) (file_stage_in=(gsiftp://bscgrid01.bsc.es/path/input1.txt home/user/input1.txt))(file_stage_out=/home/user/output1.txt gsiftp://bscgrid01.bsc.es/path/output1.txt)(file_clean_up=/home/user/input1.txt /home/user/output1.txt)"; globus_gram_client_job_request(bscgrid02.bsc.es, rsl, NULL, NULL); rsl = "&(executable=/home/user/sim)(arguments=input2.txt output2.txt) (file_stage_in=(gsiftp://bscgrid01.bsc.es/path/input2.txt /home/user/input2.txt))(file_stage_out=/home/user/output2.txt gsiftp://bscgrid01.bsc.es/path/output2.txt)(file_clean_up=/home/user/input2.txt /home/user/output2.txt)"; globus_gram_client_job_request(bscgrid03.bsc.es, rsl, NULL, NULL); rsl = "&(executable=/home/user/sim)(arguments=input3.txt output3.txt) (file_stage_in=(gsiftp://bscgrid01.bsc.es/path/input3.txt /home/user/input3.txt))(file_stage_out=/home/user/output3.txt gsiftp://bscgrid01.bsc.es/path/output3.txt)(file_clean_up=/home/user/input3.txt /home/user/output3.txt)"; globus_gram_client_job_request(bscgrid04.bsc.es, rsl, NULL, NULL); } Grid-aware Explicit parallelism

GRID superscalar: a programming model for the Grid Programming comparison  Globus vs GRIDSs void sim(File input, File output) { command = "/home/user/sim " + input + ' ' + output; gs_result = GS_System(command); } int main() { GS_On(); sim("/path/input1.txt", "/path/output1.txt"); sim("/path/input2.txt", "/path/output2.txt"); sim("/path/input3.txt", "/path/output3.txt"); GS_Off(0); }

GRID superscalar: a programming model for the Grid Programming comparison  DAGMan vs GRIDSs JOB A A.condor JOB B B.condor JOB C C.condor JOB D D.condor PARENT A CHILD B C PARENT B C CHILD D int main() { GS_On(); task_A(f1, f2, f3); task_B(f2, f4); task_C(f3, f5); task_D(f4, f5, f6); GS_Off(0); } A BC D Explicit parallelism No if/while clauses

GRID superscalar: a programming model for the Grid Programming comparison  Ninf-G vs GRIDSs int main() { grpc_initialize("config_file"); grpc_object_handle_init_np("A", &A_h, "class"); grpc_object_handle_init_np("B", &B_h," class"); for(i = 0; i < 25; i++) { grpc_invoke_async_np(A_h,"foo",&sid,f_in[2*i],f_out[2*i]); grpc_invoke_async_np(B_h,"foo",&sid,f_in[2*i+1],f_out[2*i+1]); grpc_wait_all(); } grpc_object_handle_destruct_np(&A_h); grpc_object_handle_destruct_np(&B_h); grpc_finalize(); } int main() { GS_On(); for(i = 0; i < 50; i++) foo(f_in[i], f_out[i]); GS_Off(0); } Grid-aware Explicit parallelism

GRID superscalar: a programming model for the Grid Programming comparison  VDL vs GRIDSs DV trans1( ); DV trans2( ); DV trans1( ); DV trans2( );... DV trans1( ); DV trans2( ); int main() { GS_On(); for(i = 0; i < 1000; i++) { tmp = "tmp." + i; filein = "filein." + i; fileout = "fileout." + i; trans1(tmp, filein); trans2(fileout, tmp); } GS_Off(0); } No if/while clauses

GRID superscalar: a programming model for the Grid 28 Outline 1.Introduction 2.Programming interface 3.Runtime 3.1 Scientific contributions 3.2 Developments 3.3 Evaluation tests 4.Fault tolerance at the programming model level 5.Conclusions and future work

GRID superscalar: a programming model for the Grid Scientific contributions  Runtime objectives –Extract implicit parallelism in sequential applications –Speed up execution using the Grid  Main requirement: Grid middleware –Job management –Data transfer –Security

GRID superscalar: a programming model for the Grid Scientific contributions  Apply computer architecture knowledge to the Grid (superscalar processor) Grid  ns  seconds/minutes/hours L3 Directory/Control L2 LSU IFU BXU IDU IFU BXU FPU FXU ISU

GRID superscalar: a programming model for the Grid Scientific contributions  Data dependence analysis: allow parallelism Read after Write Write after Read Write after Write task1(..., f1) task2(f1,...) task1(f1,...) task2(..., f1) task1(..., f1) task2(..., f1)

GRID superscalar: a programming model for the Grid Scientific contributions for(i=0; i < MSIZE; i++) for(j=0; j < MSIZE; j++) for(k=0; k < MSIZE; k++) matmul(A(i,k), B(k,j), C(i,j)) matmul(A(0,0), B(0,0), C(0,0)) matmul(A(0,1), B(1,0), C(0,0)) matmul(A(0,2), B(2,0), C(0,0)) i = 0 j = 0 matmul(A(0,0), B(0,0), C(0,1)) matmul(A(0,1), B(1,0), C(0,1)) matmul(A(0,2), B(2,0), C(0,1)) i = 0 j = 1... k = 0 k = 1 k = 2 k = 0 k = 1 k = 2

GRID superscalar: a programming model for the Grid Scientific contributions for(i=0; i < MSIZE; i++) for(j=0; j < MSIZE; j++) for(k=0; k < MSIZE; k++) matmul(A(i,k), B(k,j), C(i,j)) matmul(A(0,0), B(0,0), C(0,0)) matmul(A(0,1), B(1,0), C(0,0)) matmul(A(0,2), B(2,0), C(0,0)) i = 0 j = 0 matmul(A(0,0), B(0,0), C(0,1)) matmul(A(0,1), B(1,0), C(0,1)) matmul(A(0,2), B(2,0), C(0,1)) i = 0 j = 1 i = 0 j = 2... i = 1 j = 0 i = 1 j = 1 i = 1 j = 2... k = 0 k = 1 k = 2 k = 0 k = 1 k = 2

GRID superscalar: a programming model for the Grid Scientific contributions  File renaming: increase parallelism Read after Write Write after Read Write after Write task1(..., f1) task2(f1,...) Unavoidable task1(f1,...) task1(..., f1) Avoidable task2(..., f1)task2(..., f1_NEW) task2(..., f1)task2(..., f1_NEW)

GRID superscalar: a programming model for the Grid Developments  Basic functionality –Job submission (middleware usage) Select sources for input files Submit, monitor or cancel jobs Results collection –API implementation GS_On: read configuration file and environment GS_Off: wait for tasks, cleanup remote data, undo renaming GS_(F)Open: create a local task GS_(F)Close: notify end of local task GS_Barrier: wait for all running tasks to finish GS_System: translate path GS_Speculative_End: barrier until throw. If throw, discard tasks from throw to GS_Speculative_End GS_Throw: use gs_result to notify it

GRID superscalar: a programming model for the Grid Developments Middleware... Task scheduling: Direct Acyclic Graph

GRID superscalar: a programming model for the Grid Developments  Task scheduling: resource brokering –A resource broker is needed (but not an objective) –Grid configuration file Information about hosts (hostname, limit of jobs, queue, working directory, quota, …) Initial set of machines (can be changed dynamically)...

GRID superscalar: a programming model for the Grid Developments  Task scheduling: resource brokering –Scheduling policy Estimation of total execution time of a single task FileTransferTime: time to transfer needed files to a resource (calculated with the hosts information and the location of files) –Select fastest source for a file ExecutionTime: estimation of the task’s run time in a resource. An interface function (can be calculated, or estimated by an external entity) –Select fastest resource for execution Smallest estimation is selected

GRID superscalar: a programming model for the Grid Developments  Task scheduling: resource brokering –Match task constraints and machine capabilities –Implemented using the ClassAd library Machine: offers capabilities (from Grid configuration file: memory, architecture, …) Task: demands capabilities –Filter candidate machines for a particular task Software = BLAST SoftwareList = BLAST, GAMESS SoftwareList = GAMESS

GRID superscalar: a programming model for the Grid Developments Middleware f1f2 f3 Task scheduling: File locality

GRID superscalar: a programming model for the Grid Developments  Other file locality exploitation mechanisms –Shared input disks NFS or replicated data –Shared working directories NFS –Erasing unused versions of files (decrease disk usage) –Disk quota control (locality increases disk usage and quota may be lower than expected)

GRID superscalar: a programming model for the Grid Evaluation NAS Grid BenchmarksRepresentative benchmark, includes different types of workflows which emulate a wide range of Grid Applications Simple optimization example Representative of optimization algorithms, workflow with two-level synchronization New product and process development Production application, workflow with parallel chains of computation Potential energy hypersurface for acetone Massively parallel, long running application Protein comparisonProduction application, big computational challenge, massively parallel, high number of tasks fastDNAmlWell-known application in the context of MPI for Grids, workflow with synchronization steps

GRID superscalar: a programming model for the Grid Evaluation  NAS Grid Benchmarks ED HC VP MB

GRID superscalar: a programming model for the Grid Evaluation  Run with classes S, W, A (2 machines x 4 CPUs)  VP benchmark must be analyzed in detail (does not scale up to 3 CPUs)

GRID superscalar: a programming model for the Grid Evaluation  Performance analysis –GRID superscalar runtime instrumented –Paraver tracefiles from the client side –The lifecycle of all tasks has been studied in detail  Overhead of GRAM Job Manager polling interval

GRID superscalar: a programming model for the Grid Evaluation  VP.S task assignment –14.7% of the transfers when exploiting locality –VP is parallel, but its last part is sequentially executed BT MF MG MF FT Kadesh8 KhafreRemote file transfers

GRID superscalar: a programming model for the Grid Evaluation  Conclusion: workflow and granularity are important to achieve speed up

GRID superscalar: a programming model for the Grid 48 Two-dimensional potential energy hypersurface for acetone as a function of the 1, and 2 angles 3.3 Evaluation

GRID superscalar: a programming model for the Grid Evaluation  Number of executed tasks: 1120  Each task between 45 and 65 minutes  Speed up: (32 CPUs), (64 CPUs)  Long running test, heterogeneous and transatlantic Grid 22 CPUs14 CPUs 28 CPUs

GRID superscalar: a programming model for the Grid million Proteins Genomes 15 million Proteins 3.3 Evaluation  15 million protein sequences have been compared using BLAST and GRID superscalar

GRID superscalar: a programming model for the Grid Evaluation  100,000 tasks in 4000 CPUs (= 1,000 exclusive nodes)  “Grid” of 1,000 machines with very low latency between them –Stress test for the runtime  Avoids user to work with queuing system  Saves queuing system from handling a huge set of independent tasks

GRID superscalar: a programming model for the Grid 52 GRID superscalar: programming interface and runtime  Publications Raül Sirvent, Josep M. Pérez, Rosa M. Badia, Jesús Labarta, "Automatic Grid workflow based on imperative programming languages", Concurrency and Computation: Practice and Experience, John Wiley & Sons, vol. 18, no. 10, pp , Rosa M. Badia, Raul Sirvent, Jesus Labarta, Josep M. Perez, "Programming the GRID: An Imperative Language-based Approach", Engineering The Grid: Status and Perspective, Section 4, Chapter 12, American Scientific Publishers, January Rosa M. Badia, Jesús Labarta, Raül Sirvent, Josep M. Pérez, José M. Cela and Rogeli Grima, "Programming Grid Applications with GRID Superscalar", Journal of Grid Computing, Volume 1, Issue 2, 2003.

GRID superscalar: a programming model for the Grid 53 GRID superscalar: programming interface and runtime  Work related to standards R.M. Badia, D. Du, E. Huedo, A. Kokossis, I. M. Llorente, R. S. Montero, M. de Palol, R. Sirvent, and C. Vázquez, "Integration of GRID superscalar and GridWay Metascheduler with the DRMAA OGF Standard", Euro-Par, Raül Sirvent, Andre Merzky, Rosa M. Badia, Thilo Kielmann, "GRID superscalar and SAGA: forming a high-level and platform- independent Grid programming environment", CoreGRID Integration Workshop. Integrated Research in Grid Computing, Pisa (Italy), 2005.

GRID superscalar: a programming model for the Grid 54 Outline 1.Introduction 2.Programming interface 3.Runtime 4.Fault tolerance at the programming model level 4.1 Checkpointing 4.2 Retry mechanisms 4.3 Task replication 5.Conclusions and future work

GRID superscalar: a programming model for the Grid Checkpointing  Inter-task checkpointing  Recovers sequential consistency in the out-of- order execution of tasks –Single version of every file is saved –No need to save any data structures in the runtime  Drawback: some completed tasks may be lost –Application-level checkpoint can avoid this

GRID superscalar: a programming model for the Grid Checkpointing  Conclusions –Low complexity in order to checkpoint a task ~1% overhead introduced –Can deal with both application level errors or Grid level errors Most important when an unrecoverable error appears –Transparent for end users

GRID superscalar: a programming model for the Grid Retry mechanisms Middleware Automatic drop of machines CC

GRID superscalar: a programming model for the Grid Retry mechanisms Middleware Soft and hard timeouts for tasks Soft timeout FailureSuccess Soft timeoutHard timeout

GRID superscalar: a programming model for the Grid Retry mechanisms Middleware Retry of operations Request FailureSuccess syscall SuccessFailure

GRID superscalar: a programming model for the Grid Retry mechanisms  Conclusions –Keep running despite failures –Dynamic: when and where to resubmit –Detects performance degradations –No overhead when no failures are detected –Transparent for end users

GRID superscalar: a programming model for the Grid Task replication Middleware Replicate running tasks depending on successors

GRID superscalar: a programming model for the Grid Task replication Middleware Replicate running tasks to speed up the execution

GRID superscalar: a programming model for the Grid Task replication  Conclusions –Dynamic replication: application level knowledge is used (the workflow) –Replication can deal with failures hiding retry overhead –Replication can speed up applications in heterogeneous Grids –Transparent for end users –Drawback: increased usage of resources

GRID superscalar: a programming model for the Grid Fault tolerance features  Publications Vasilis Dialinos, Rosa M. Badia, Raül Sirvent, Josep M. Pérez and Jesús Labarta, "Implementing Phylogenetic Inference with GRID superscalar", Cluster Computing and Grid 2005 (CCGRID 2005), Cardiff, UK, Raül Sirvent, Rosa M. Badia and Jesús Labarta, "Graph-based task replication for workflow applications", Submitted, HPCC 2009.

GRID superscalar: a programming model for the Grid 65 Outline 1.Introduction 2.Programming interface 3.Runtime 4.Fault tolerance at the programming model level 5.Conclusions and future work

GRID superscalar: a programming model for the Grid Conclusions and future work  Grid-unaware programming model  Transparent features for users, exploiting parallelism and failure treatment  Used in REAL systems and REAL applications  Some future research is already ONGOING (StarSs)

GRID superscalar: a programming model for the Grid Conclusions and future work  Future work –Grid of supercomputers (Red Española de Supercomputación) –Higher scale tests (hundreds? thousands?) –More complex brokering Resource discovery/monitoring New scheduling policies based on the workflow Automatic prediction of execution times –New policies for task replication –New architectures for StarSs