Download presentation
Presentation is loading. Please wait.
Published byElaine Watkins Modified over 8 years ago
1
1 Soonwook Hwang, Hangi Kim KISTI Seyong Kim Sejong University XQCD Workshop 2009 August 3-5, 2009 Large-scale Deployment of two-color QCD on the FKPPL VO using Ganga
2
Outline FKPPL VO (Virtual Organization) Grid Ganga High-level Grid job submission and management tool Deployment of QCD Simulations on the Grid
3
FKPPL (France Korea Particle Physics Laboratory) International Associated Laboratory between France and Korea Promote joint cooperative activities (research projects) under a scientific research program in the area of Particle Physics LHC ILC E-Science Bioinformatics Grid Computing
4
FKPPL Scientific Projects Project nameCoordinatorsPartners ILC calorimeter (particle physics) Yongmann Yang, Ewha Jean-Claude Brient, LLR Ewha Womans Univ., Kangnung Nat. U niv.,LPC, LLR ILC electronics (particle physics) Jongseo Chai, SKKU Christoph de la Taille, LAL Sung Kyun Kwan Univ., Korea Institute of Radiological and medical Sciencces, Pohang Accel. Lab. LAL, LLR Grid computingS. Hwang, KISTI D. Boutigny, CC-IN2P3 KISTI, CC-IN2P3 WISDOM (in silico drug discovery) Doman Kim, CNU V. Breton, LPC Chonnam Nat. Univ., KISTI, Kangnung Nat. Univ., LPC ALICE ( heavy ions physics) Yongwook Baek, KNU Pascal Dupieux, LPC Kangnung Nat. Univ. LPC CDF (particle physics) Kihyeon Cho, KISTI Aurore Savoy-Navarro, LPNH E KISTI, LPNHE
5
FKPPL VO Grid
6
Objectives Provide computing facilities needed to foster FKPPL scientific applications and experiments Provide researchers and scientists in Korea and France with a production grid environment
7
FKPPL VO Grid Testbed ServiceHostSite UIkenobi.kisti.re.krKISTI VOMSpalpatine.kisti.re.krKISTI WMS/LBsnow.kisti.re.krKISTI SEccsrm02.in2p3.fr (0.5TB)CC-IN2P3 hansolo.kisti.re.kr (1.5TB)KISTI CEcclcgceli03.in2p3.fr (8000 CPU cores)CC-IN2P3 darthvader.kisti.re.kr (64 CPU cores)KISTI VOMS WMS CESEUICESE FKPPL VO KISTI IN2P3 LFC WIKI
8
gLite Grid Services on FKPPL VO User Interface (UI) User Interface (UI): The place where users logon to access the Grid Workload Management System (WMS) Workload Management System (WMS): Matches the user requirements with the available resources on the Grid File and replica catalog File and replica catalog: Location of grid files and grid file replicas Computing Element (CE) Computing Element (CE): A batch queue on a site’s computers where the user’s job is executed Storage Element (SE) Storage Element (SE): provides (large-scale) storage for files
9
9 Computing Element Storage Element IN2P3 Information System Submit job (executable + small inputs) Submit job query Retrieve output WMS User Interface publish state File and Replica Catalog VO Management Service (DB of VO users) query create proxy process Retrieve status & (small) output files Logging and bookkeeping Job status Logging Input file(s) Output file(s) Register file Job Submission Example
10
Utilization – CE Services # of jobsCPU used (SI2K Hour) Elapsed time (SI2K Hour) Oct200834150- Nov2008169048,250- Dec2009172146,410- Jan20093041,859,6402,092,550 Feb200915147,167,3507,732,280 Mar20098664,937,9205,068,990 Apr200924655,130,2405,666,100 May20093445,21050,560 Jun200918393,269,4403,593,100
11
User Support FKPPL VO Wiki site http://anakin.kisti.re.kr/mediawiki/index.php/FKPPL_VO http://anakin.kisti.re.kr/mediawiki/index.php/FKPPL_VO User Accounts on UI 20 User accounts has been created FKPPL VO Membership Registration 7 Users have been registered at FKPPL VO membership
12
Application Support on FKPPL VO Deployment of Geant4 applications Cancel Treatment Plan In collaboration with Dr. Jungwook Shin of National Cancer Center in Korea Deployment of two-color QCD simulations In collaboration with Prof. Seyong Kim of Sejong University
13
Introduction to Ganga
14
Ganga easy-to-use user interface for job submission and management to Specification, submission, bookkeeping, and post processing of computational tasks on a wide set of distributed resources Provides a homogeneous environment for processing data on heterogeneous resources Grid Localho st Batch GangaGanga LCG/gLite Backend PBS or SGE Backend Local Backend LCG/gLite Cmd or Lib PBS or SGE Cmd or Lib Local Cmd or Lib Athena GAUDI ROOT Executabl e
15
Overall Architecture The user interact with the Ganga Public Interface via GUI CLIP Script Plugins provided for different execution backends (e.g., Grid, Batch and Local) and applications (e.g., ROOT, GAUDI and Athena) easily extended and customized to meet the needs of different user communities
16
Uniform Interface to Heterogeneous Resources For the user, running a job interactively is no different than running on the Grid
17
Deployment of QCD Simulations on the FKPPL VO
18
Our Two-color QCD Applications (1/2) Large-scale A large number of simulation jobs to be run with a wide range of different parameters In our case, we have planned to run a total of 360 different QCD jobs with a different parameter set beta = [0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6. 1.7, 1.75, 1.8, 1.85, 1.9, 1.93, 1.95, 1.98, 2.0, 2.1, 2.2, 2.3] (20) J = [0.04, 0.05, 0.06] (3) mu = [0.0, 0.575, 0.65, 0.85, 0.9, 1] (6) Independent Each job runs independently Long-duration Each QCD job goes through 200 steps to complete, each step taking an average of 1 Hour, so each QCD job takes an average of 200 hours
19
Need a computing facility to run a large number of jobs FKPPL VO provides computing resources sufficient to run the 360 QCD jobs all together concurrently Need some grid tool to effectively maintain such a large-scale jobs running on the Grid without having to know the details of the underlying Grid Ganga seems to be appropriate as a tool for managing such large number of jobs on the Grid Our Two-color QCD Applications (2/2)
20
Issues relating to long-running jobs on the Grid Long-running jobs often fail to complete on the Grid It is not straightforward to successfully get done a long-duration job like our two-color QCD simulation on the Grid A Grid proxy certificate expires before the job’s completion By default, the proxy has a lifetime of 12 hours Each Grid site has its own site operational policy such as the maximum CPU time for a job to be allowed to run at a time SitesCE NodesMaxCPUTime (Min) CC-IN2P3 Jobmanager-bqs-short16 Jobmanager-bqs-medium751 Jobmanager-bqs-long4753 KISTIJobmanager-lcgpbs-fkppl4880
21
Application-level Checkpointing/Restarting We have modified the original two-color QCD simulation code to support an application- level checkpointing scheme The two color QCD code takes 200 steps to complete Once a QCD job is launched successfully on the Grid, a intermediate result is generated at each step and saved into the checkpoint server When a QCD job is detected to be stopped for some reason, Ganga restarts it from where it has left by resubmitting it along with the latest intermediate result
22
22 Computing Element Storage Element IN2P3 (Re)Submit QCD job (executable + small inputs) Submit QCD job Retrieve output WMS Su2.x Retrieve status & (small) output files Input file(s) Output file(s) Overview of QCD Simulation Runs on the Grid Heartbeat Monitor Check status & Intermediate Result send intermediate result CheckpointServer Retrieve the latest Intermediate result
23
Distribution of QCD Jobs on the FKPPL VO
24
Two-color QCD Production ~ 8.2 CPU years 360 runs * 200 step * 1 hours = 72000 hours ~ 360 concurrent QCD runs on the FKPPL VO As of now (Aug. 03), 51.70% has been done out of a total of 72,000 steps
25
Summary FKPPL VO Grid provides a production-level Grid infrastructure for scientists to carry out relatively large-scale simulation runs Ganga/Application-level checkpointing makes it straightforward to run a long-running jobs on the Grid on a large scale
26
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.