Development of test suites for the certification of EGEE-II Grid middleware Task 2: The development of testing procedures focused on special details of various software features Task 4: Creating the specialized testbed for developing test suites Task 5: Preparing intermediate and final reports PNPI – Yu. Ryabov, N. Klopov
Plans for the second year Development of the stress and performance tests for WMS and CE according with requests from developers and/or certification team 2. New gLite 3.1 middleware installation on the testbed
Requirements to the test Submit a large number of jobs simultaneously Submit jobs from one or many users. Monitoring of a load of CE and WMS during testing. Monitoring of jobs status (pass through system’s components) on the CE and WMS during testing. Storing of status information for all submitted jobs. Possibility of express visual analysis of results.
Functional schema of the test Monitoring Monitoring CE WMS Monitoring data Jobs logging info Jobs logging info UI ….. Data collector Job submitter zapuskaet v backgrounde neskol’ko scriptov, kagdii iz kotorih zapuskaet I monitoriruet parametric job ot imeni konkretnogo usera. Na WMS and CE pered testirovaniem zapuskaetsya monitor, kotorii sledit za load average I zagrugennimi processami. Posle zaversheniya testa Job collector zabiraet dannie monitorirovaniya s CE I WMS, a takge zaprashivaet po komande glite-wms-job-logging-info statusnuyu informaciyu dlya vseh subjobs vsex parametric jobs. Eta informaciya m.b. peredana dlya express visual analysis na web site. P.S. V kachestve programmi, kotoraya zapuskaetsya na work nodaes, used simple bash script which sleep zadannoe chislo secund (esli sprosyat) Parametric job Parametric job Job submitter
Jobs submission Job submission program (several scripts) has the following input parameters: u- the number of the users x- path to the directory with users proxy certificates (x1- path to the user proxy certificate) n -the number of the subjobs from each user s- time interval between jobs status request t -max time of the test execution a- the time of a subjob will execute on WN l- path to the logfile
These scripts run on CE and WMS and provide Monitoring These scripts run on CE and WMS and provide receiving and saving information about load average and system processes names. The script runs with the following parameters: t - pool time l -request for load average p -request for process names Load average ~The quantity of active processes (from UNIX)
copy monitoring data from WMS and CE; Data collector The Data collector script is executed after finish of all jobs and does the following: copy monitoring data from WMS and CE; request the event time information for each subjob, using glite-wms-job-logging-info command; preliminary data processing (formatting); Data processing – formatirovanie dannix chtobi bilo udobno obrabativat’ na saite v cgi scriptah
Parametric job Parametric job functionality was used to solve the problem of simultaneous submission of large number of jobs to CE. Parametric job is a set of jobs (subjobs) with the same descriptions apart from the values of the parametric attributes. JobType = "Parametric"; Executable = "tst.sh"; InputSandbox = {“tst.sh", "input_PARAM_.txt"}; StdOutput = "out_PARAM_.txt"; StdError="err_PARAM_.txt"; OutputSandbox={"out_PARAM_.txt", "err_PARAM_.txt"}; Parameters=1000; ParameterStart=0; ParameterStep=1; Parametric attributes get values from 0 to 999. WMS will create individual subjob for each parameter value. N=(Parameters-ParameterStart)/ParameterStep subjobs will be created Both main parametric job and its subjobs will have unique IDs.
Testbed WN WMS+LB+ BDII CE WN WN UI WN gLite 3.1 middleware was installed on the testbed: WN WMS+LB+ BDII CE WN WN UI WN
Test usage Measurement of “load average” as function of time under the following condition: N jobs from each of K users Test usage in PNPI: 1000 jobs from each user (1 user, 3 users) for “old” and “new” versions gLite; Old - we had been using till January 2008 New (with marshal patches) - we have been using since January 2008 New version with marshal patches was released to production 10 April 2008 (gLite update 23) Marshal patches was developed by A.Kiryanov (PNPI)
Marshal patches for LCG CE Aim is to improve behavior of LCG CEs under load by regulating requests from job managers (hence the term ‘marshal’) due to : Eliminate the necessity to recompile heavy Perl code on every job manager invocation Memory-persistent daemons handle the requests Control of the number of simultaneously running job manager queries Decreases load on file system and batch system Prevent CE overload by WMSes Decrease system’s losses Jobs complete faster, especially visible with large number of short jobs
CE monitoring Dlya 1 usera net suschestvennih izmenenii dlya staroi I new version
CE monitoring Dlya 3 users (kagdii po 1000 subjobs) – zametnaya raznica po load average and time vipolneniya vseh subjobs
WMS monitoring Zagruzka WMS malen’kaya v oboih sluchayah
Express visual analysis (WEB viewer) Each job passes through the different WMS components (the corresponding events are generated and stored in LB. Example of these events: “RegJob,NetworkServer”, “Match,WorkloadManager”,…,”Done, LogMonitor”). It gives the possibility to evaluate the performance of the WMS components. The WEB viewer was developed to provide the visual representation of events timestamp for the jobs running through the different components. This viewer provide the following functions: - to choose the event type which will be sorted by the timestamp value; to choose data file with logging info data; to get the graph of the event time since job registration in WMS for each job; to choose the additional event type (will be represented on the same graph); - to get and store graph data as text file for the future analysis; - to get ID and logging info data for the subjobs those lost the chosen events; - to view monitoring data. Data processing – formatirovanie dannix chtobi bilo udobno obrabativat’ na saite v cgi scriptah
Express visual analysis Transfer (source- Logmonitor destination- LRMS) Sostav saita – html stranici and cgi-scripts Na picture vidno kak event Running ot Logmonitor otstaet ot event Running P.S. slaid s animaciei Accepted (source Logmonitor)
Express visual analysis We can view the monitoring data Est’ vozmognost’ rassmatrivat’ dannie monitorirovaniya CE and WMS. Link: http://biod.pnpi.spb.ru/totem/test/ctest.html Posle etogo slaida nugen slaid pro novuyu versiyu
Summary The testbed was created with the gLite 3.1 A complex test was developed which provide the following: Submission of the large number of jobs from many users Load average monitoring on WMS and CE Data acquisition of the test results Developed test has been used on concrete sets of input parameters HTML viewer was developed for the presentation of test results
Summary (First year of the grant) Set of WMS tests (control of functionality) was developed according to the request from gLite certification team for the following types of jobs: parametric, interactive, checkpointable, partitionable. Long and complex JDL stress test (for estimation of critical size of file) Some of the tests were included into certification SAM framework. 5 bugs were found and submitted in Savannah.
Conclusion Task 2 (PNPI)- done Task 4 (PNPI)-done Task 5 –under preparation (together with collaborating teams)