Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Soonwook Hwang, Hangi Kim KISTI Seyong Kim Sejong University XQCD Workshop 2009 August 3-5, 2009 Large-scale Deployment of two-color QCD on the FKPPL.

Similar presentations


Presentation on theme: "1 Soonwook Hwang, Hangi Kim KISTI Seyong Kim Sejong University XQCD Workshop 2009 August 3-5, 2009 Large-scale Deployment of two-color QCD on the FKPPL."— Presentation transcript:

1 1 Soonwook Hwang, Hangi Kim KISTI Seyong Kim Sejong University XQCD Workshop 2009 August 3-5, 2009 Large-scale Deployment of two-color QCD on the FKPPL VO using Ganga

2 Outline  FKPPL VO (Virtual Organization) Grid  Ganga  High-level Grid job submission and management tool  Deployment of QCD Simulations on the Grid

3 FKPPL (France Korea Particle Physics Laboratory)  International Associated Laboratory between France and Korea  Promote joint cooperative activities (research projects) under a scientific research program in the area of  Particle Physics  LHC  ILC  E-Science  Bioinformatics  Grid Computing

4 FKPPL Scientific Projects Project nameCoordinatorsPartners ILC calorimeter (particle physics) Yongmann Yang, Ewha Jean-Claude Brient, LLR Ewha Womans Univ., Kangnung Nat. U niv.,LPC, LLR ILC electronics (particle physics) Jongseo Chai, SKKU Christoph de la Taille, LAL Sung Kyun Kwan Univ., Korea Institute of Radiological and medical Sciencces, Pohang Accel. Lab. LAL, LLR Grid computingS. Hwang, KISTI D. Boutigny, CC-IN2P3 KISTI, CC-IN2P3 WISDOM (in silico drug discovery) Doman Kim, CNU V. Breton, LPC Chonnam Nat. Univ., KISTI, Kangnung Nat. Univ., LPC ALICE ( heavy ions physics) Yongwook Baek, KNU Pascal Dupieux, LPC Kangnung Nat. Univ. LPC CDF (particle physics) Kihyeon Cho, KISTI Aurore Savoy-Navarro, LPNH E KISTI, LPNHE

5 FKPPL VO Grid

6 Objectives  Provide computing facilities needed to foster FKPPL scientific applications and experiments  Provide researchers and scientists in Korea and France with a production grid environment

7 FKPPL VO Grid Testbed ServiceHostSite UIkenobi.kisti.re.krKISTI VOMSpalpatine.kisti.re.krKISTI WMS/LBsnow.kisti.re.krKISTI SEccsrm02.in2p3.fr (0.5TB)CC-IN2P3 hansolo.kisti.re.kr (1.5TB)KISTI CEcclcgceli03.in2p3.fr (8000 CPU cores)CC-IN2P3 darthvader.kisti.re.kr (64 CPU cores)KISTI VOMS WMS CESEUICESE FKPPL VO KISTI IN2P3 LFC WIKI

8 gLite Grid Services on FKPPL VO User Interface (UI) User Interface (UI): The place where users logon to access the Grid Workload Management System (WMS) Workload Management System (WMS): Matches the user requirements with the available resources on the Grid File and replica catalog File and replica catalog: Location of grid files and grid file replicas Computing Element (CE) Computing Element (CE): A batch queue on a site’s computers where the user’s job is executed Storage Element (SE) Storage Element (SE): provides (large-scale) storage for files

9 9 Computing Element Storage Element IN2P3 Information System Submit job (executable + small inputs) Submit job query Retrieve output WMS User Interface publish state File and Replica Catalog VO Management Service (DB of VO users) query create proxy process Retrieve status & (small) output files Logging and bookkeeping Job status Logging Input file(s) Output file(s) Register file Job Submission Example

10 Utilization – CE Services # of jobsCPU used (SI2K Hour) Elapsed time (SI2K Hour) Oct200834150- Nov2008169048,250- Dec2009172146,410- Jan20093041,859,6402,092,550 Feb200915147,167,3507,732,280 Mar20098664,937,9205,068,990 Apr200924655,130,2405,666,100 May20093445,21050,560 Jun200918393,269,4403,593,100

11 User Support  FKPPL VO Wiki site  http://anakin.kisti.re.kr/mediawiki/index.php/FKPPL_VO http://anakin.kisti.re.kr/mediawiki/index.php/FKPPL_VO  User Accounts on UI  20 User accounts has been created  FKPPL VO Membership Registration  7 Users have been registered at FKPPL VO membership

12 Application Support on FKPPL VO  Deployment of Geant4 applications  Cancel Treatment Plan  In collaboration with Dr. Jungwook Shin of National Cancer Center in Korea  Deployment of two-color QCD simulations  In collaboration with Prof. Seyong Kim of Sejong University

13 Introduction to Ganga

14 Ganga  easy-to-use user interface for job submission and management to  Specification, submission, bookkeeping, and post processing of computational tasks on a wide set of distributed resources  Provides a homogeneous environment for processing data on heterogeneous resources Grid Localho st Batch GangaGanga LCG/gLite Backend PBS or SGE Backend Local Backend LCG/gLite Cmd or Lib PBS or SGE Cmd or Lib Local Cmd or Lib Athena GAUDI ROOT Executabl e

15 Overall Architecture  The user interact with the Ganga Public Interface via  GUI  CLIP  Script  Plugins provided for different execution backends (e.g., Grid, Batch and Local) and applications (e.g., ROOT, GAUDI and Athena)  easily extended and customized to meet the needs of different user communities

16 Uniform Interface to Heterogeneous Resources  For the user, running a job interactively is no different than running on the Grid

17 Deployment of QCD Simulations on the FKPPL VO

18 Our Two-color QCD Applications (1/2)  Large-scale  A large number of simulation jobs to be run with a wide range of different parameters  In our case, we have planned to run a total of 360 different QCD jobs with a different parameter set  beta = [0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6. 1.7, 1.75, 1.8, 1.85, 1.9, 1.93, 1.95, 1.98, 2.0, 2.1, 2.2, 2.3] (20)  J = [0.04, 0.05, 0.06] (3)  mu = [0.0, 0.575, 0.65, 0.85, 0.9, 1] (6)  Independent  Each job runs independently  Long-duration  Each QCD job goes through 200 steps to complete, each step taking an average of 1 Hour, so each QCD job takes an average of 200 hours

19  Need a computing facility to run a large number of jobs  FKPPL VO provides computing resources sufficient to run the 360 QCD jobs all together concurrently  Need some grid tool to effectively maintain such a large-scale jobs running on the Grid without having to know the details of the underlying Grid  Ganga seems to be appropriate as a tool for managing such large number of jobs on the Grid Our Two-color QCD Applications (2/2)

20 Issues relating to long-running jobs on the Grid  Long-running jobs often fail to complete on the Grid  It is not straightforward to successfully get done a long-duration job like our two-color QCD simulation on the Grid  A Grid proxy certificate expires before the job’s completion  By default, the proxy has a lifetime of 12 hours  Each Grid site has its own site operational policy such as the maximum CPU time for a job to be allowed to run at a time SitesCE NodesMaxCPUTime (Min) CC-IN2P3 Jobmanager-bqs-short16 Jobmanager-bqs-medium751 Jobmanager-bqs-long4753 KISTIJobmanager-lcgpbs-fkppl4880

21 Application-level Checkpointing/Restarting  We have modified the original two-color QCD simulation code to support an application- level checkpointing scheme  The two color QCD code takes 200 steps to complete  Once a QCD job is launched successfully on the Grid, a intermediate result is generated at each step and saved into the checkpoint server  When a QCD job is detected to be stopped for some reason,  Ganga restarts it from where it has left by resubmitting it along with the latest intermediate result

22 22 Computing Element Storage Element IN2P3 (Re)Submit QCD job (executable + small inputs) Submit QCD job Retrieve output WMS Su2.x Retrieve status & (small) output files Input file(s) Output file(s) Overview of QCD Simulation Runs on the Grid Heartbeat Monitor Check status & Intermediate Result send intermediate result CheckpointServer Retrieve the latest Intermediate result

23 Distribution of QCD Jobs on the FKPPL VO

24 Two-color QCD Production  ~ 8.2 CPU years  360 runs * 200 step * 1 hours = 72000 hours  ~ 360 concurrent QCD runs on the FKPPL VO  As of now (Aug. 03), 51.70% has been done out of a total of 72,000 steps

25 Summary  FKPPL VO Grid  provides a production-level Grid infrastructure for scientists to carry out relatively large-scale simulation runs  Ganga/Application-level checkpointing  makes it straightforward to run a long-running jobs on the Grid on a large scale

26 Thank you


Download ppt "1 Soonwook Hwang, Hangi Kim KISTI Seyong Kim Sejong University XQCD Workshop 2009 August 3-5, 2009 Large-scale Deployment of two-color QCD on the FKPPL."

Similar presentations


Ads by Google