Information System testing for LCG-1

Slides:



Advertisements
Similar presentations
Workload Management David Colling Imperial College London.
Advertisements

EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Hands on Job Services.
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
The Grid Constantinos Kourouyiannis Ξ Architecture Group.
Job Submission The European DataGrid Project Team
Workload Management meeting 07/10/2004 Federica Fanzago INFN Padova Grape for analysis M.Corvo, F.Fanzago, N.Smirnov INFN Padova.
Development of test suites for the certification of EGEE-II Grid middleware Task 2: The development of testing procedures focused on special details of.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
Basic Grid Job Submission Alessandra Forti 28 March 2006.
Job Submission The European DataGrid Project Team
Glite I/O Storm Testing in EDG-LCG Framework Elena Slabospitskaya, Vadim Petukhov, (IHEP, Russia) Gilbert Grosdidier, (CNRC, France) NEC'2005, Sept 16.
The gLite API – PART I Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
INFSO-RI Enabling Grids for E-sciencE GILDA Praticals GILDA Tutors INFN Catania ICTP/INFM-Democritos Workshop on Porting Scientific.
1 BIG FARMS AND THE GRID Job Submission and Monitoring issues ATF Meeting, 20/06/03 Sergio Andreozzi.
Computational grids and grids projects DSS,
Enabling Grids for E-sciencE Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)
M. Sgaravatto – n° 1 The EDG Workload Management System: release 2 Massimo Sgaravatto INFN Padova - DataGrid WP1
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
LCG Middleware Testing in 2005 and Future Plans E.Slabospitskaya, IHEP, Russia CERN-Russia Joint Working Group on LHC Computing March, 6, 2006.
Grid Workload Management Massimo Sgaravatto INFN Padova.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
Mar 28, 20071/18 The OSG Resource Selection Service (ReSS) Gabriele Garzoglio OSG Resource Selection Service (ReSS) Don Petravick for Gabriele Garzoglio.
EGEE is a project funded by the European Union under contract IST Job Description Language - more control over your Job Assaf Gottlieb University.
Part Five: Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
Job Management DIRAC Project. Overview  DIRAC JDL  DIRAC Commands  Tutorial Exercises  What do you have learned? KEK 10/2012DIRAC Tutorial.
Enabling Grids for E-sciencE Workload Management System on gLite middleware - commands Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
Workload Management System Jason Shih WLCG T2 Asia Workshop Dec 2, 2006: TIFR.
INFSO-RI Enabling Grids for E-sciencE GILDA Praticals Giuseppe La Rocca INFN – Catania gLite Tutorial at the EGEE User Forum CERN.
EGEE is a project funded by the European Union under contract IST Job Description Language – How to control your Job Nadav Grossaug IsraGrid.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
Job Submission The European DataGrid Project Team
User Interface UI TP: UI User Interface installation & configuration.
LCG2 Tutorial Viet Tran Institute of Informatics Slovakia.
Presentation of the results khiat abdelhamid
GRID commands lines Original presentation from David Bouvet CC/IN2P3/CNRS.
Introduction to Computing Element HsiKai Wang Academia Sinica Grid Computing Center, Taiwan.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
Information System Valeria Ardizzone INFN
Workload Management System on gLite middleware
Special jobs with the gLite WMS
gLite Information System(s)
Practical: The Information Systems
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS: the CMS interface for job summission, monitoring and bookkeeping
Corso di Calcolo Parallelo Grid Computing
EGEE tutorial, Job Description Language - more control over your Job Assaf Gottlieb Tel-Aviv University EGEE is a project.
Job Submission in the DataGrid Workload Management System
Workload Management System
BOSS: the CMS interface for job summission, monitoring and bookkeeping
Job Description Language
gLite Job Management Mario Reale GARR
5. Job Submission Grid Computing.
Special Jobs: MPI Alessandro Costa INAF Catania
Job Management with DATA
login: clermont-ferrandxx password: GridCLExx
Globus Job Management. Globus Job Management Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
gLite Information System(s)
The EU DataGrid Job Submission Services
The gLite Workload Management System
EGEE Middleware: gLite Information Systems (IS)
gLite Job Management Christos Theodosiou
Job Description Language
GENIUS Grid portal Hands on
GRID Workload Management System for CMS fall production
Job Submission M. Jouvin (LAL-Orsay)
Presentation transcript:

Information System testing for LCG-1 Elena Slabospitskaya Institute for High Energy Physics, Protvino, Russia 18.07.2003

Information System testing for LCG-1 Task to do: A test suite has to be developed to allow for extensive stress testing of the information system via rapid submission of very large number of user jobs, placing many enquiries in a very short time. Current situation: We have to use MDS for current LCG-1 release. R-GMA is not ready for deployment now. R-GMA may be used in the future. Hence, our goal is to develop the test suite for both information systems, R-GMA and MDS.

Information System testing for LCG-1 What is the problem? developers have unit testing suites for both MDS and R-GMA however, these are usable only by developers themselves we need a test suite executable in user environment under production conditions Goal to check the accuracy of the different types of MDS or R-GMA data during the job enquiry. to allow choosing different InfoProvider : top MDS, CE or MON write the test suite in Perl

Information System testing for LCG-1 The algorithm obtain all information tuples (a row of the database tables) from Glue Schema if the tuple is full, prepare the job for submission the tuple is included in job description file as the Condor ClassAd expression generate automatically jdl, rsl or sub files (whichever is required) submit the job via: RB (edg-job-submit) directly to the CE via Globus GRAM ( globusrun and condorG commands: Gilbert Grosdidier' idea). send optionally many jobs sequentially or in parallel (stress test)

Information System testing for LCG-1 WN WN CE PBS, LSF.... WN Globus EDG Gatekeeper CondorG Globusrun CondorG CondorG Workload Manager RB Network server UI Edg-job-submit The schema of the job submission via RB and directly to the CE via Globus GRAM

Information System testing for LCG-1 Usage JobInfo.pl [-help| -h] JobInfo.pl [-rb|-gr|-cg] [-MDS|-RGMA] [-t time] [-seq jobs |-par jobs] [-host info] [-CE ce] [-size size] [-dir dirname] Where: -rb - jobs submission (edg-job-submit) through Resource Broker -gr - Checking direct jobs submission to CE (without RB) by globusrun commands. -cg - Checking the direct submission of jobs to CE by CondorG commands. (The client part of CondorG software must be installed on the UI under this user name) -MDS or -RGMA - The type of the information system (default MDS) -t time - The time interval (seconds) between jobs (default 5) -seq jobs - How many jobs are running sequentially -par jobs - How many jobs are running or parallel Only one can be given. If none is given, one job will be run. -host info - Top MDS or other grid information provider hostname -CE ce - CE host name -size size - The size of input file (in Mb) - up to 2Gb. -dir dirname - directory for the log files (default: JobInfo) E.g: ./JobInfo.pl -rb -MDS -t 5 -par 3 -host lxshare0242 -CE lxshare0290 -size 8

Information System testing for LCG-1 Source of the test suite is in: http://datagrid.in2p3.fr/cgi-bin/cvsweb.cgi/edg-tests/tests/Stress/Info The example of tuples: GlueCEInfoLRMSType: pbs GlueCEInfoLRMSVersion: OpenPBS_2.4 GlueCEInfoTotalCPUs: 4 GlueCEStateEstimatedResponseTime: 0 GlueCEStateFreeCPUs: 4 GlueCEStateRunningJobs: 0 GlueCEStateStatus: Production GlueCEStateTotalJobs: 0 GlueCEStateWaitingJobs: 0 GlueCEStateWorstResponseTime: 0 GlueCEPolicyMaxCPUTime: 172800 GlueCEPolicyMaxRunningJobs: 99999 GlueCEPolicyMaxTotalJobs: 999999 GlueCEPolicyMaxWallClockTime: 69120

Information System testing for LCG-1 The example of jdl file Executable= 'data.pl'; Arguments= 'none'; StdOutput= 'std.out'; StdError= 'std.err'; InputSandbox= “{'data.pl','data.dat'}; OutputSandbox={'std.out','std.err'}; Requirements= other.GlueCEInfoLRMSversion==”OpenPBS_2.4”; The data.pl file #!/usr/bin/perl print "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\n"; print "Now the checksum and the size of data file are testing\n"; print `cksum "data.dat"`;

Information System testing for LCG-1 The example of rsl file (for globusrun) &(executable= "/bin/echo") (arguments= "Result string is: Hello, globusrun!") (stdout= x-gass-cache://$(GLOBUS_GRAM_JOB_CONTACT)stdout anExtraTag) (stderr= x-gass-cache://$(GLOBUS_GRAM_JOB_CONTACT)stderr anExtraTag) (maxCpuTime=10) The example of sub file (for CondorG) executable = /bin/hostname globusscheduler = lxshare0290:2119/jobmanager-pbs universe = globus output = JobCondorG0.out log = JobCondorG0.log error= JobCondorG0.err requirement= GlueCEHostingCluster == "lxshare0277.cern.ch" queue

Information System testing for LCG-1 The part of log file (edg-job-submit) ===============================NEW JOB ======================== Executable ="data.pl"; Arguments = "none"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = {"data.pl","data.dat"}; OutputSandbox = {"std.out","std.err"}; Requirements = other.GlueCEHostingCluster == "lxshare0277.cern.ch"; Now jdl file is checking by Resource Broker... Connecting to host lxshare0234.cern.ch, port 7772 *************************************************************************** COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* lxshare0277.cern.ch:2119/jobmanager-pbs-infinite lxshare0277.cern.ch:2119/jobmanager-pbs-long lxshare0277.cern.ch:2119/jobmanager-pbs-medium lxshare0277.cern.ch:2119/jobmanager-pbs-short Now trying to submit our job..... Logging to host lxshare0234.cern.ch, port 9002

Information System testing for LCG-1 The part of log file (globusrun) ================= NEW JOB================= GlueCEInfoTotalCPUs==”2” Now trying to submit our job..... globus_gram_client_callback_allow successful GRAM Job submission successful https://lxshare0290.cern.ch:20002/27207/1057931614/ GLOBUS_GRAM_PROTOCOL_JOB_STATE_ACTIVE ================= NEW JOB================== GlueCEInfoLRMSType==”pbs” https://lxshare0290.cern.ch:20003/27251/1057931625/ Checking the job status.....https://lxshare0290.cern.ch:20002/27207/1057931614/ DONE The job https://lxshare0290.cern.ch:20002/27207/1057931614/ is finishing successfull Checking the job status.....https://lxshare0290.cern.ch:20003/27251/1057931625/ The job https://lxshare0290.cern.ch:20003/27251/1057931625/ is finishing successfull

Information System testing for LCG-1 The part of log file (condorG command) =============== NEW JOB ================= GlueCEHostingCluster == "lxshare0277.cern.ch" Now trying to submit our job..... Submitting job(s). Logging submit event(s). 1 job(s) submitted to cluster 384 GlueCEName == "infinite" 1 job(s) submitted to cluster 385. Checking the job status.....384. -- Submitter: lxshare0276.cern.ch : <137.138.145.63:32827> : lxshare0276.cern.ch ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 384.0 lspitsky 7/11 21:25 0+00:00:02 R 0 0.0 hostname 384.0 lspitsky 7/11 21:25 0+00:00:10 C 0 0.0 hostname The job 384. is finishing successfull Checking the job status.....385. The job 385. is finishing successfull

Information System testing for LCG-1 To be done 1. Presenting the test suite' results in a common LCG fashion (html results array) 2. For R-GMA the information can be obtained in two ways: via the port directly (using LDAP query) via the R-GMA API or CLI The comparison should be made for consistency

Information System testing for LCG-1 Acknowlegments I am grateful to our Grid Deployment Group for hospitality. My gratitude to Ian Bird for discussions about the real direction of work. Many thanks to Marco Serra, Di Qing, Piera Bettini and Louis Poncet for fruitfull discussions. And special thanks to Zdenek Sekera, Gilbert Grosdidier and Frederique Chollet for multiple and very useful discussions and for collaborative work. The work was supported by INTAS, grant INTAS-CERN 00-440