Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enabling Grids for E-sciencE www.eu-egee.org EGEE-II INFSO-RI-031688 Porting an application to the EGEE Grid & Data management for Application Rachel Chen.

Similar presentations


Presentation on theme: "Enabling Grids for E-sciencE www.eu-egee.org EGEE-II INFSO-RI-031688 Porting an application to the EGEE Grid & Data management for Application Rachel Chen."— Presentation transcript:

1 Enabling Grids for E-sciencE www.eu-egee.org EGEE-II INFSO-RI-031688 Porting an application to the EGEE Grid & Data management for Application Rachel Chen Academia Sinica Grid Computing hsinyu@twgrid.org

2 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 2 Outline Introduction The workflow of porting an application to the Grid Common command list The practical without Data management The practical with Data management References

3 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 3 Introduction The main goal:  Application porting: to port and execute an existing non-grid application to the Grid.  Application development: to develop a grid application. Some sources define this process commonly as “gridifying”. There are many useful applications which need gridifying.

4 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 4 Introduction There are a lot of applications using EGEE. In HEP –ATLAS: http://atlas.web.cern.ch/Atlas/index.html –CMS: http://cms.cern.ch/ –LHCb: http://lhcb.web.cern.ch/ –ALICE: http://aliceinfo.cern.ch/ http://aliceinfo.cern.ch/ –…… In Bioinformatics –NPS@: http://gpsa-pbil.ibcp.fr/ –BioDCV: http://biodcv.itc.it/ –3DEM: http://3dem.ucsd.edu/http://3dem.ucsd.edu/ –…… In Biomed –WISDOM: http://wisdom.eu-egee.fr/ –AvianFlu: http://www.twgrid.org/Application/Bioinformatics/AvainFlu-GAP/ –…… More…..

5 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 5 The workflow of Grid Application development 1.Analyze the application 2.Develop a non-grid application (or inheriting and updating an ancient one) 3.Execute, Test and Debug the application 4.Construct the job suit – JDL (Job Description Language) files, executables, auxiliary scripts and input/output data files 5.Upload your data files to SE 6.Submit the job to the Grid 7.Execute, Test and Debug the application; 8.IF something goes wrong THEN GOTO 4 (or 2 )

6 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 6 Some information We are using the GILDA testbed today The production EGEE grid looks like this! Current EGEE production middleware GILDA is one VO on EGEE  resources for training and prototyping

7 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 7 Practicals The application called add will be ported and executed in grid environment. add is written in C, java, python programming languages.

8 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 8 Common command list CommandThe meaning voms-proxy-initInitialize user proxy. voms-proxy-infoGet the user proxy information. edg-job-submitSubmit a job. edg-job-list-matchInvestigate whether there is CE for the job. edg-job-statusGet the job status. edg-job-get-outputGet the job output. lcg-cpCopies a grid file to a local destination. lcg-crCopies a file to a SE and registers the file in the catalog. lcg-delremove a file/directory. lfc-lsList file/directory entries in a directory. lfc-mkdirCreate a directory.

9 Enabling Grids for E-sciencE www.eu-egee.org EGEE-II INFSO-RI-031688 The practical without Data management

10 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 10 add program (Version 1, No Data management) Reads input data from a file called testFile.txt. This file must be specified in the JDL file. From the input file, add 2 values on the same line, then output the result to the standard output. need a parameter:./add testFile.txt(c) java add testFile.txt(java) python add.py testFile.txt(python) 12 23 1 50 add INPUT1 35 51 OUTPUT

11 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE Prerequisites: File add.c/add.java/add.py – the source codes of the programs File testFile.txt– it contains a sample input values. File add.jdl – a prepared JDL (Job Description Language) file. File run.sh – a script that you can execute your executable file manually. File readme.txt – introduce all files in the folder. A complier or an interpreter –A standard C compiler and linker. In this case we will use GNU C (gcc) already installed. –Java –Python interpreter 11

12 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 12 add: logon Step: 1.Log on to the GILDA user interface using PuTTY SSH (Secure shell) client located on your Windows Desktop. (The user input is given in red color.) Hostname: glite-tutor.ct.infn.it login as: taipeiXX (taipei01~taipei50) (where XX is your number) Password: GridTAIXX (GridTAI01~GridTAI50) (where XX is the same number)

13 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 13 add: getting the prerequisites Step: 2.Download the prerequisites stored in a zipped file NoDataManagement.zip with the following command: wget http://t-ap16.grid.sinica.edu.tw/isgc2008/NoDataManagement.zip * Unzip the archive in your current directory with the command: unzip NoDataManagement.zip * Change the current directory: cd NoDataManagement There are 3 folders(c, java, python) and 3 files(readme.txt, std.out, testFile.txt) in the current directory. The folder name is the language name we use for this example. Please choose one of them and change your working directory. cd c or cd java or cd python

14 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 14 add: compilation (C) Step: 3.Compile and link the program using GNU C compiler / linker: gcc -o add add.c This will create an executable file add. Look at the directory content: ls -l

15 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 15 add: compilation (Java) Step: 3.Compile the program using Java compiler: javac add.java This will create a class file add.class. Look at the directory contents: ls -l

16 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 16 add: compilation (Python) Step: 3. No needed if you choose python.

17 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 17 add: testing as a non-grid application Step: 4.execute your program with the following command:./add testFile.txt (C) java add testFile.txt (Java) python add.py testFile.txt (Python) Look at the content of the input file testFile.txt: more testFile.txt And you may examine the source code: more add.c (C) more add.java (Java) more add.py (Python)

18 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 18 gLite: entering the Grid! Step: 5.Login to the GILDA Grid  Initialize your proxy: voms-proxy-init --voms gilda This will ask for the passphrase which is TAIPEI for all users. Check the proxy status with: voms-proxy-info -all

19 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 19 gLite: Checking the job requirements Step: 6.Investigate whether there is matched resource for the job: edg-job-list-match --vo gilda add.jdl This command will produce a listing with all of the Grid Computing elements together with jobmanager queues that fulfill the requirements of our job.

20 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 20 gLite: Checking the job requirements A list of Computer Elements that can execute your program

21 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 21 gLite: Submitting the job to GILDA Grid Steps: 7.Execute the following command: edg-job-submit -o myJobId --vo gilda add.jdl This will submit the job and will store its unique identifier in a file called myJobId. You may look at that file. 8.Monitor the job status with: edg-job-status -i myJobId Execute this command several times until “Done (Success)” status.

22 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 22 Practical (continued): retrieving the job results Step: 9.Execute the following command: edg-job-get-output -i myJobId -dir./ This will retrieve the Output sandbox files and will store them into a local directory under the current directory. Directory name will be something like taipeiXXX_6tJj5hmisLFXsl9zoSaw6A. Enter the output directory and look at the file std.out cd taipeiXXX_6tJj5hmisLFXsl9zoSaw6A more std.out

23 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE Step: 10.Look to the supplied add.jdl file: more add.jdl The add.jdl looks like (if you choose python): 23 add: the JDL-file Executable = "/usr/bin/env"; JobType = "Normal"; Arguments = "python add.py testFile.txt"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = { "add.py", "testFile.txt" }; OutputSandbox = { "std.out", "std.err" } Executable = "/usr/bin/env"; JobType = "Normal"; Arguments = "python add.py testFile.txt"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = { "add.py", "testFile.txt" }; OutputSandbox = { "std.out", "std.err" }  Executable– sets the name of the executable file;  JobType – the type of the job;  Arguments– command line arguments of the program;  StdOutput, StdError - files for storing the standard output and error messages output;  InputSandbox – input files needed by the program, including the executable;  OutputSandbox – output files which will be written during the execution, including standard output and standard error output;

24 Enabling Grids for E-sciencE www.eu-egee.org EGEE-II INFSO-RI-031688 The practical with Data management

25 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 25 Data management in gLite Why do we need data management in our application? Sharing The size of the data set Getting data more efficiently

26 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 26 add program (Version 2, Data management) Reads input data from a file and the logical file name is /grid/gilda/training/taipei/YOUR_ACCOUNT/testFile.txt. From the input file, add 2 values on the same line, then output the result to the file called result.txt and some message to the standard output. need some parameters: if you want to execute it on non-grid env:./add testFile.txt result.txt (c) java add testFile.txt result.txt (java) python add.py testFile.txt result.txt (python) 12 23 1 50 add INPUT1 35 51 OUTPUT

27 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE Prerequisites: File add.c/add.java/add.py – the source codes of the programs File testFile.txt– it contains a sample input values. You have to use lfc command to upload this file to SE and register it in the LFC. File add.jdl – a prepared JDL (Job Description Language) file. File run.sh– a script that you can execute your executable file manually. File runJob.sh – a script used on the WN. File readme.txt – introduce all files in the folder. A complier or an interpreter –A standard C compiler and linker. In this case we will use GNU C (gcc) already installed. –Java –Python interpreter 27

28 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 28 add: getting the prerequisites Step: 1.Go to your home directory and download the prerequisites stored in a zipped file DataManagement.zip with the following command: wget http://t-ap16.grid.sinica.edu.tw/isgc2008/DataManagement.zip Unzip the archive in your current directory with the command: unzip DataManagement.zip (This will create a subdirectory DataManagement with all of the prerequisite files inside.) Change the current directory: cd DataManagement There are 3 folders(c, java, python) and 4 files(readme.txt, std.out, testFile.txt, and result.txt) in the current directory. The folder name is the language name we use for this example. Please choose one of them and change your working directory. cd c or cd java or cd python

29 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 29 Add: export some env Step 2. Export some necessary environment variables: export LFC_HOST=lfc-gilda.ct.infn.it export LCG_GFAL_INFOSYS=glite-rb.ct.infn.it:2170 export LCG_CATALOG_TYPE=lfc export PATH=$PATH:/opt/lcg/bin export LFC_HOST=lfc-gilda.ct.infn.it export LCG_GFAL_INFOSYS=glite-rb.ct.infn.it:2170 export LCG_CATALOG_TYPE=lfc export PATH=$PATH:/opt/lcg/bin

30 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 30 Modify some files Step: 3. Please open your file “add.jdl” and modify the item “InputData” according to your account name InputData = {"lfn:/grid/gilda/training/taipei/YOUR_ACCOUNT/testFile.txt"}; eg: InputData = {"lfn:/grid/gilda/training/taipei/taipei01/testFile.txt"}; And modify the file “runJob.sh” and try to modify the following line according to your account name lcg-cp -v --vo gilda lfn:/grid/gilda/training/taipei/YOUR_ACCOUNT/testFile.txt file:`pwd`/testFile.txt eg: lcg-cp -v --vo gilda lfn:/grid/gilda/training/taipei/taipei01/testFile.txt file:`pwd`/testFile.txt

31 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 31 Upload the input file Step: 4. Make your own directory in the file catalog: lfc-mkdir /grid/gilda/training/taipei/YOUR_ACCOUNT/ eg: lfc-mkdir /grid/gilda/training/taipei/taipei01/ Use the command to upload the file “testFile.txt” into the SE and register this file into the file catalog: lcg-cr --vo gilda -v –d iceage-se-01.ct.infn.it –l lfn:/grid/gilda/training/taipei/YOUR_ACCOUNT/testFile.txt file:/home/YOUR_ACCOUNT/DataManagement/testFile.txt eg: lcg-cr --vo gilda -v -d iceage-se-01.ct.infn.it -l lfn:/grid/gilda/training/taipei/taipei01/testFile.txt file:/home/taipei01/DataManagement/testFile.txt Check the directory contents: lfc-ls /grid/gilda/training/taipei/YOUR_ACCOUNT/ eg: lfc-ls /grid/gilda/training/taipei/taipei01/

32 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 32 add: compilation (C) Step: 5. Compile and link the program using GNU C compiler / linker: gcc -o add add.c This will create an executable file add. Look at the directory contents: ls -l

33 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 33 add: compilation (Java) Step: 5. Compile and link the program using Java compiler: javac add.java This will create a class file add.class. Look at the directory contents: ls -l

34 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 34 add: compilation (Python) Step: 5. No needed if you choose python.

35 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 35 add: testing as a non-grid application Step: 6. execute your program with the following commands:./add testFile.txt result.txt (C) java add testFile.txt result.txt (Java) python add.py testFile.txt result.txt (Python) Look at the content of the input file testFile.txt: more testFile.txt Look at the content of the output file result.txt: more result.txt And you may examine the source code: more add.c (C) more add.java (Java) more add.py (Python)

36 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 36 gLite: Checking the job requirements Step: 7. Investigate whether there is matched resource for the job: edg-job-list-match --vo gilda add.jdl This command will produce a listing with all of the Grid Computing elements together with jobmanager queues that fulfill the requirements of our job.

37 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 37 gLite: Checking the job requirements A list of Computer Elements that can execute your program

38 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 38 gLite: Submitting the job to GILDA Grid Steps: 8. Execute the following command: edg-job-submit -o myJobId --vo gilda add.jdl This will submit the job and will store its unique identifier in a file called myJobId. You may look at that file. 9. Monitor the job status with: edg-job-status -i myJobId Execute this command several times until “Done (Success)” status.

39 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 39 Practical (continued): retrieving the job results Step: 10. Execute the following command: edg-job-get-output -i myJobId -dir./ This will retrieve the Output sandbox files and will store them into a local directory under the current directory. Directory name will be something like taipeiXX_6tJj5hmisLFXsl9zoSaw6A. Change the output directory and look at the files result.txt and std.out cd taipeiXX_6tJj5hmisLFXsl9zoSaw6A more std.out more result.txt

40 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE Look to the supplied add.jdl file: more add.jdl The add.jdl looks like (if you choose python): 40 add: the JDL-file Executable = "/bin/sh"; JobType = "Normal"; Arguments = "runJob.sh"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = { "add.py", "runJob.sh" }; InputData = {"lfn:/grid/gilda/training/taipei/YOUR_ACCOUNT/testFile.txt"}; DataAccessProtocol = {"rfio","gridftp","gsiftp"}; OutputSandbox = { "std.out", "std.err", "result.txt" } Executable = "/bin/sh"; JobType = "Normal"; Arguments = "runJob.sh"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = { "add.py", "runJob.sh" }; InputData = {"lfn:/grid/gilda/training/taipei/YOUR_ACCOUNT/testFile.txt"}; DataAccessProtocol = {"rfio","gridftp","gsiftp"}; OutputSandbox = { "std.out", "std.err", "result.txt" }  InputData – representing the Logical File Name (LFN) or Grid Unique Identifier (GUID) needed by the job as input;  DataAccessProtocol – the application is able to “speak” with for accessing files listed in InputData on a given SE;

41 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 41 add: runJob.sh Step: 11. Look to the supplied runJob.sh file: more runJob.sh The runJob.sh looks like (if you choose python): #!/bin/sh export LFC_HOST=lfc-gilda.ct.infn.it export LCG_GFAL_INFOSYS=glite-rb.ct.infn.it:2170 export LCG_CATALOG_TYPE=lfc export PATH=$PATH:/opt/lcg/bin # get the file from SE lcg-cp -v --vo gilda lfn:/grid/gilda/training/taipei/YOUR_ACCOUNT/testFile.txt file:`pwd`/testFile.txt # execute it /usr/bin/env python add.py testFile.txt result.txt #!/bin/sh export LFC_HOST=lfc-gilda.ct.infn.it export LCG_GFAL_INFOSYS=glite-rb.ct.infn.it:2170 export LCG_CATALOG_TYPE=lfc export PATH=$PATH:/opt/lcg/bin # get the file from SE lcg-cp -v --vo gilda lfn:/grid/gilda/training/taipei/YOUR_ACCOUNT/testFile.txt file:`pwd`/testFile.txt # execute it /usr/bin/env python add.py testFile.txt result.txt

42 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 42 Summary Understand grid infrastructure Data management Information management WMS ……. Understand the requirement and the workflow Requirement analysis. Modify your program. Keep close communication with domain experts. Adapt grid existing applications NA4 in EGEE: http://egee-na4.ct.infn.it/index.php

43 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 43 Support for application gridification SZTAKI operates as Grid Application Support Centre More information http://www.lpds.sztaki.hu/gasucwww.lpds.sztaki.hu/gasuc

44 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 44 References JDL Attributes https://edms.cern.ch/document/590869/1 gLite 3.0 User Guide https://edms.cern.ch/file/722398/1.1/ R-GMA overview page http://www.r-gma.org/ GLUE Schema http://infnforge.cnaf.infn.it/glueinfomodel/ JDL attributes specification for WM proxy https://edms.cern.ch/document/590869/1

45 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 45 More exercises… Another Example: This program is very similar with add, but it returns the product after multiplying 2 values on the same line of the input file testFile.txt. Download sample codes, input files and some scripts –Without Data management http://t-ap16.grid.sinica.edu.tw/isgc2008/hw/NoDataManagement.zip –With Data management http://t-ap16.grid.sinica.edu.tw/isgc2008/hw/DataManagement.zip Write your own JDL file and script used on the WN if needed, please. You have to upload your output file (result.txt) to SE /grid/gilda/training/taipei/YOUR_ACCOUNT/resultFile.txt

46 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE 46 Questions?


Download ppt "Enabling Grids for E-sciencE www.eu-egee.org EGEE-II INFSO-RI-031688 Porting an application to the EGEE Grid & Data management for Application Rachel Chen."

Similar presentations


Ads by Google