EGEE-II INFSO-RI Enabling Grids for E-sciencE Grid application development with gLite and P-GRADE Portal Miklos Kozlovszky MTA SZTAKI
Enabling Grids for E-sciencE EGEE-II INFSO-RI Presenter MTA SZTAKI (Hungarian Academy of Sciences) Laboratory of Parallel and Distributed Systems –Miklos Kozlovszky EGEE-III (Enabling Grids for E-sciencE) oGASUC Team oTrainings and dissemination activities SEE-GRID2 / SEE-GRID-SCI (South Eastern European GRID-enabled eInfrastructure Development) o Manager of “Dissemination and Training” (WP5/NA3)
Enabling Grids for E-sciencE EGEE-II INFSO-RI Introduction of LPDS (Lab of Parallel and Distr. Systems) Research division of MTA SZTAKI from 1998 Head: Peter Kacsuk, Prof. 22 research fellows Foundation member – Central European Grid Consortium (2003) – Hungarian Grid Competence Center (2003) Participant or coordinator in many European and national Grid research, infrastructure, and educational projects (from 2000) – FP5: GridLab, DataGrid – FP6: EGEE I-II, SEE-GRID I-II, CoreGrid, ICEAGE, CancerGrid – FP7: EGEE III, SEE-GRID-SCI, EDGeS (coordinator), ETICS, S-CUBE Central European Grid Training Center in EGEE (from 2004)
Enabling Grids for E-sciencE EGEE-II INFSO-RI Webpage Find it from EGEE User Forum Webpage OR EGEE Training webpage (Google EGEE NA3) Events and registration (top menu) ..., Paris, December Save the direct link! Long term storage of training material –Presentations in PPT –Tutorials in HTML/DOC/PDF
Enabling Grids for E-sciencE EGEE-II INFSO-RI Feedback form Your comments and feedbacks are highly valuable for EGEE training Please fill in the feedback form and return at the end of the course Anonymous Scores: (very bad - very good) Comments are highly appreciated
Enabling Grids for E-sciencE EGEE-II INFSO-RI Goals of the day Basic concepts of –Workflow –Parameter study on EGEE Implementation in P-GRADE Portal Further information –How to learn more –How to get access to EGEE –How to port your own application to EGEE
Enabling Grids for E-sciencE EGEE-II INFSO-RI Agenda Application development on gLite * –Workflow and parameter study concepts on EGEE –Workload management and data services in gLite Workflow and parameter study support in P-GRADE Portal Hands-on –Workflow exercises –Parameter study exercises How to learn more * = (mostly skipped, please refer to previous presentations from yesterday)
Enabling Grids for E-sciencE EGEE-II INFSO-RI Agenda Application development on gLite –Workflow and parameter study concepts on EGEE –Workload management and data services in gLite Workflow and parameter study support in P-GRADE Portal Hands-on –Workflow exercises –Parameter study exercises How to learn more
Enabling Grids for E-sciencE EGEE-II INFSO-RI Grid vision GRIDMIDDLEWAREGRIDMIDDLEWARE Visualising Workstation Mobile Access Supercomputer, PC-Cluster Data-storage, Sensors, Experiments Internet, networks
Enabling Grids for E-sciencE EGEE-II INFSO-RI Problems to solve Standardised access to resources –Computers –Storages –Special equipments –Software services Access policy Load balancing Monitoring resources and services Monitoring applications Fault management Programming contepts, level of abstraction User interfaces...
Enabling Grids for E-sciencE EGEE-II INFSO-RI EGEE grid, gLite middleware Where computer science meets the application communities! The tools, services used by the VO’s applications NA4 Recommended External Software Packages for Egee CommuniTies –Current RESPECT tools: GridWay P-GRADE Portal – “Grid software” menu Basic gLite services: CE, SE, info, security Higher-level gLite services (WMS,...) Application toolkits Application Production infrastructure contains these services –High level services: help the users building their computing infrastructure but should not be mandatory –Basic services: Must be complete and robust; Should not assume the use of Higher-Level Grid Services Command line & APIs
Enabling Grids for E-sciencE EGEE-II INFSO-RI INTERNET gLite middleware runs on each EGEE site to provide –Data services: Computing Element –Computation services: Storage Element –Security service Sites and users form Virtual Organisations: basis for collaboration Each VO can / must have central software services and support groups VO concept P-GRADE Portal
Enabling Grids for E-sciencE EGEE-II INFSO-RI Basic gLite use case: Job submission Computing Element Storage Element Site X Information System Submit job (executable + small inputs) Submit job query Retrieve output Resource Broker User Interface publish state File and Replica Catalog VO Management Service (DB of VO users) query create proxy process Retrieve status & (small) output files Logging and bookkeeping Job status Logging Input file(s) Output file(s) Register file
Enabling Grids for E-sciencE EGEE-II INFSO-RI User Interface (UI) User Interface (UI): The place where users logon to the Grid Computing Element (CE) Computing Element (CE): A batch queue on a site’s computers where the user’s job is executed Storage Element (SE) Storage Element (SE): provides (large-scale) storage for files Resource Broker (RB) (Workload Management System (WMS) Resource Broker (RB) (Workload Management System (WMS): Matches the user requirements with the available resources on the Grid Main components Information System Information System: Characteristics and status of CE and SE File and replica catalog File and replica catalog: Location of grid files and grid file replicas Logging and Bookkeeping (LB) Logging and Bookkeeping (LB): Log information of jobs
Enabling Grids for E-sciencE EGEE-II INFSO-RI User Interface (UI) User Interface (UI): The place where users logon to the Grid Computing Element (CE) Computing Element (CE): A batch queue on a site’s computers where the user’s job is executed Storage Element (SE) Storage Element (SE): provides (large-scale) storage for files Resource Broker (RB) (Workload Management System (WMS) Resource Broker (RB) (Workload Management System (WMS): Matches the user requirements with the available resources on the Grid Main components Information System Information System: Characteristics and status of CE and SE File and replica catalog File and replica catalog: Location of grid files and grid file replicas Logging and Bookkeeping (LB) Logging and Bookkeeping (LB): Log information of jobs All built upon authorisation, authentication, security
Enabling Grids for E-sciencE EGEE-II INFSO-RI How can I get access to EGEE? Obtain a certificate from a recognized CA: – – Find the official CA of your countrywww.gridpma.org 1 year long, renewable certificates Accepted in every EGEE VO –GILDA CA – two weeks long, renewable certificate Accepted only in GILDA training VO (VO to be used today) Find and register at a VO –List of VOs with Usage rules: CIC Operations portal: Scientific discipline Geographical region Use the VO services –Through (low level) command line tools of gLite (Not today) –Through high level tools E.g. P-GRADE Portal, GENIUS, GANGA,... Access mechanism varies from tool to tool CA VO manager Obtaining certificate: Annually VOMS database Grid sites VO Membership Service Joining VO: Once
Enabling Grids for E-sciencE EGEE-II INFSO-RI Application developer’s questions I have a computational intensive problem How does it relate to this scenario? –What is a grid job for me? –How many jobs do I have, how they relate to each other and to my data? –What is the input / output data for each job? –How to write a job to access input / output data? –How to submit, monitor the job? How to access their results? –Do I need to use additional services to my the application demands? Answers –Now (sometimes specifically on P-GRADE Portal) –Or any time later for general purpose from Grid Application Support group (GASuC)
Enabling Grids for E-sciencE EGEE-II INFSO-RI Functional Vs Data parallelism Functional Decomposition (Functional Parallelism) –Decomposing the problem into different jobs which can be distributed to different CEs for simultaneous execution Different executables run on different CEs (and may or may not process the same data) –Good to use when When the data cannot be partitioned there is not static structure or fixed determination of number of calculations to be performed
Enabling Grids for E-sciencE EGEE-II INFSO-RI Functional decomposition Job 1 on Computing Element #1 Job 2 on Computing Element #2 time The problem Job 3 on Computing Element #3 Job 4 on Computing Element #4 Job submission Job monitoring Result download
Enabling Grids for E-sciencE EGEE-II INFSO-RI Functional decomposition in practice: workflow time The problem Job submission Job monitoring Result transfer Data dependency Job submission Job monitoring Job submission Job monitoring Result download Workflow manager e.g. P-GRADE Portal server
Enabling Grids for E-sciencE EGEE-II INFSO-RI Functional Vs Data parallelism Data Decomposition (Data Parallelism) –Partitioning the problem's data domain and distributing portions to multiple instances of the same job for simultaneous execution –Same executable runs on different CEs and process different data –Good to use for problems where: data is static (e.g. factoring, solving large matrix or finite difference calculations, parameter studies) dynamic data structure tied to single entity where entity can be subsetted (large multi-body problems) domain is fixed but computation within various regions of the domain is dynamic (fluid vortices models) > 90% of grid applications employ data parallelism (parameter study, parametric study)
Enabling Grids for E-sciencE EGEE-II INFSO-RI Data decomposition Job 1 on Computing Element #1 Job 2 on Computing Element #2 Job 3 on Computing Element #3 Job 4 on Computing Element #4 The problem Data segment 1 time Job submission Job monitoring Result download Data segment 2 Data segment 3 Data segment 4 Algorithm
Enabling Grids for E-sciencE EGEE-II INFSO-RI Data decomposition in practice: Master-slave Master job Slave job Final result Inputs Results Generate inputs Spawn slaves Monitor slaves Collect results Generate final result Job submit Get job output Master process, e.g. P-GRADE Portal server
Enabling Grids for E-sciencE EGEE-II INFSO-RI Multi-level master-slave Master job Slave job Input Results Generate inputs Spawn slaves Monitor slaves Collect results Job submit Check job status Get job output Final result Master job Slave job Input Results Generate inputs Spawn slaves Monitor slaves Collect results Job submit Check job status Get job output Generate final result
Enabling Grids for E-sciencE EGEE-II INFSO-RI Complex master-slave Final result Master job Slave job input results Generate inputs Spawn slaves Monitor slaves Collect results Slave job input results Generate inputs Spawn slaves Monitor slaves Collect results Slave job input results Generate inputs Spawn slaves Monitor slaves Collect results Generate final result
Enabling Grids for E-sciencE EGEE-II INFSO-RI Complex master-slave = Parameter study workflow Final result Master job Slave job input results Generate local inputs Spawn slaves Monitor slaves Collect local results Slave job input results Generate local inputs Spawn slaves Monitor slaves Collect local results Slave job input results Generate local inputs Spawn slaves Monitor slaves Collect local results Generate result Workflow manager 3 file 9 file 3 input 9 input 27 output 3 x 9 = 27 WF
Enabling Grids for E-sciencE EGEE-II INFSO-RI Defining a job Executable (EGEE runs Scientific Linux v3 or v4) –Script: No compilation is necessary Can invoke real executable which is statically installed on the CE (VOBox) –Binary: Must be compiled on the User Interface binary compatibility with EGEE is guaranteed Statically linked to avoid errors caused by library versions Input / output data –Input files Smaller than 20 MByte? If YES transfer them from client side (“InputSandbox” ) If NOT upload them into Storage element before job submission –Output files Smaller than 20 MByte? If YES transfer them back to client side (“OutputSandbox”) if NOT upload them into Storage element from Computing Element
Enabling Grids for E-sciencE EGEE-II INFSO-RI Distribution of large datasets Puts large files into Storage Elements and register them in Logical File Catalog (LFC) (covered already during previous sessions) Large files do not go through the broker Master job Slave job Logical File Names Generate local inputs Spawn slaves Monitor slaves Collect local results Generate result Job submit Check job status Get job output LFC & SEs Inputs Results Logical File Names Broker
Enabling Grids for E-sciencE EGEE-II INFSO-RI File services in gLite Storage Element 3 sfn://trigriden01.unime.it/flatfiles/SE00/gilda/generated/ /filec79a9e3c a2a5-235f Storage Element 2 srm://aliserv6.ct.infn.it/dpm/ct.infn.it/home/gilda/generated/ /filea21ab3e2-8ff6-4a44-82a7-f2 Users’ files are stored on Storage Elements A file on a SE is identified by a Storage URL (e.g. sfn://grid005.iucc.ac.it/flatfiles/SE00/gilda/generated/ /filec79a9e3c a2a5-235f) User refer to files by Logical File Names (LFN) LFC = directory structure of LFNs + pointers to SURLs (Files can have replicas) lfn:/grid/gilda/kozlovszky/run2/ input1 input2 input3 Storage Element 1 sfn://grid005.iucc.ac.il/storage/gilda/generated/ /fileb233d43f-5bc6-4ede-a5fe-611d48be2ba5 LFC Storage Element 4 sfn://grid005.iucc.ac.it/flatfiles/SE00/gilda/generated/ /filec79a9e3c a2a5-235f
Enabling Grids for E-sciencE EGEE-II INFSO-RI Name conventions Users primarily access and manage files through “logical filenames” Defined by the userLFC Namespace LFC has a directory tree structure lfn:/grid/ / Today: lfn:/grid/gilda/parisXX/...
Enabling Grids for E-sciencE EGEE-II INFSO-RI Managing a workload with gLite command line tools Login to the User Interface machine Write your jobs. Operations in a job: –Access LFC, resolve LFN –Access SE, get file content –Process file –Write result to SE –Register file in LFC (Compile your jobs to get the executables) Write a job description for each job using Job Description Language (JDL) –Text file –Specifies Executable, Input and Output LFNs –Specifies resource requirements and preferences (Which CE) Write the description of your workload –Workflow JDL or parametric job JDL (No parametric workflow!) myworkload.jdl Use shell commands to –Submit the workload: glite-wms-job-submit myworkload.jdl wlID –Monitor the status: glite-wms-job-status wlID –Get the output sandbox:glite-wms-job-output wlID Write a program (e.g. script) to –Register input files in LFC before the workload is started –Resubmit failed jobs –Download result files from Storages when wokrload is finished
Enabling Grids for E-sciencE EGEE-II INFSO-RI Managing a workload with gLite command line tools Login to the User Interface machine Write your jobs. Operations in a job: –Access LFC, resolve LFN –Access SE, get file content –Process file –Write result to SE –Register file in LFC (Compile your jobs to get the executables) Write a job description for each job using Job Description Language (JDL) –Text file –Specifies Executable, Input and Output LFNs –Specifies resource requirements and preferences (Which CE) Write the description of your workload –Workflow JDL or parametric job JDL (No parametric workflow!) myworkload.jdl Use shell commands to –Submit the workload: glite-wms-job-submit myworkload.jdl wlID –Monitor the status: glite-wms-job-status wlID –Get the output sandbox:glite-wms-job-output wlID Write a program (e.g. script) to –Register input files in LFC before the workload is started –Resubmit failed jobs –Download result files from Storages when wokrload is finished `
Enabling Grids for E-sciencE EGEE-II INFSO-RI Further information, references EGEE – gLite middleware – gLite manuals, documentation – (gLite user guide) Recommended External Software Packages for EGEE Communities (RESPECT) – P-GRADE Grid Portal – P-GRADE Grid Portal (Here to login…) –