Download presentation
Presentation is loading. Please wait.
Published byJoanna Ellis Modified over 8 years ago
1
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI
2
2 Outline Status of KISTI integration with Geant4 resources The production system Some more details on DIANE For more information see: Andrea Dotti 2012 J. Phys.: Conf. Ser. 396 032033
3
3 Status As of February 18, 2013 jobs are running at KISTI via darthvader All nodes occupied at 100% Full production performed in about 48hrs What has been done: Installed missing libraries Performed simple testing (starting application locally) Performed remote testing (small scale): start one job at the time remotely from CERN (using full infrastructure) Performed full production test (queue of 2200 jobs): submit maximum number of jobs and monitor cluster is 100% occupied on several hours, check output
4
4 Results: from production monitoring Jobs configurations Jobs configurations Jobs queue Total 2.4M events Jobs queue Total 2.4M events Output at CERN repository Failures die to misconfiguration (my-fault) Stable production mode: no problems observed over several hours Rate of produced events strongly depends on configuration, expect to simulate All events in 48 hrs
5
5 Production system System based on four components 1. CernVM-FS: to distribute (read-only) the Geant4 software 2. DIANE/GANGA: to submit jobs to the grid and retrieve the output 3. SimplifiedCalorimeter: Geant4 application (LHC calorimeters) to extensively test all aspects of physics simulations 4. Results DataBase: to store summaries from 3., logging information of jobs status, include web-application to produce plots
6
6 Architecture Python wrapper Application DIANE and GANGA OS / GRID middleware CernVM-FS Recognized as The most critical (DIANE not anymore supported) Includes interaction w/ DB and analysis Of results (not discussed here)
7
7 Deployment GANGA session DIANE CERN Repo Node Squid HTTP proxy Squid HTTP proxy Failover Job: “connect to DIANE server and get work” Download: work config Upload: results Communication: CORBA KISTI
8
8 DIANE master Python application It defines a queue of tasks A task is defined by: Command line to execute Command line arguments Input and output files (if any) # tell DIANE that we are just running executables # the ExecutableApplication module is a standard DIANE test application from diane_test_applications import ExecutableApplication as application # the run function is called when the master is started # input.data stands for run parameters def run(input,config): d = input.data.task_defaults # this is just a convenience shortcut # all tasks will share the default parameters (unless set otherwise in individual task) d.input_files = ['hello.sh'] d.output_files = ['message.out'] d.executable = 'hello' # here are tasks differing by arguments to the executable for i in range(20): t = input.data.newTask() t.args = [str(i)] # tell DIANE that we are just running executables # the ExecutableApplication module is a standard DIANE test application from diane_test_applications import ExecutableApplication as application # the run function is called when the master is started # input.data stands for run parameters def run(input,config): d = input.data.task_defaults # this is just a convenience shortcut # all tasks will share the default parameters (unless set otherwise in individual task) d.input_files = ['hello.sh'] d.output_files = ['message.out'] d.executable = 'hello' # here are tasks differing by arguments to the executable for i in range(20): t = input.data.newTask() t.args = [str(i)] User provides a “run” function that defines tasks hello.sh: #!/bin/bash echo $1 > message.out
9
9 DIANE master and workers T1 T2 T3 T4 Diane- master Diane- worker corba A second small python application: Needs Corba IOR address of master 1.Get a task (i.e. command line and parameters to execute) 2.Get intput files (G4 macro file, analysis support files, execution script) 3.Execute task 4.Send results (ROOT files, log-files) 5.Repeat if more tasks exist
10
10 Some notes Diane-worker is not a GRID job We use GANGA to start the diane-workers on remote sites But we can use SSH / QSUB / whatever To start a worker the only information needed is the CORBA address of the master Corba (omniORB) is used to create a point-to-point communication channel between master and workers Machine where the master runs need some ports open Multiple diane-masters are allowed as long as each one listens on his own port
11
11 Possible work-plan A possible work-activity Develop an alternative solution to DIANE Requirements: Should retrieve output and store results in a central repository. Output size 10-100GB / month Should allow several users to use the system at the same time Should be possible to use a GRID submission systems (e.g. GANGA) to submit jobs Should integrate with LCG resources and OSG Support for batch submission and direct SSH What about clouds?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.