Jakub T. Moscicki (KUBA) CERN Ganga Tutorial Jakub T. Moscicki (KUBA) CERN
Agenda Part I: Ganga introduction Part II: Ganga hands-on Part III: More about Ganga EGEE - SEE Tutorial, Budapest
Part I: Ganga Overview
The Ganga Project DIRAC Batch Grids Panda “configure once, run anywhere” user applications resources GUI Ganga Interface Localhost Batch Condor Grids DIRAC Panda cmd line shell scripting jobs Interface helping users to configure applications and submit jobs to the Grid The primary focus is on the Atlas and LHCb applications. The project was initiated by HEP collaborations because they appreciated the need of a consistent and easy to use environment which would allow users to focus on their tasks and hide all technicalities of the underlying systems. Ganga is an open system which interfaces to generic resources and middleware (Batch FARMS, Grid MIDDLEWARE, Condor...) as well as the experiment specific systems (DIRAC, Panda). user may move between different resources and systems easily tool is adapted to the needs of the users and their different working styles and habits: GUI shell scripts Ganga provides access to auxiliary services such as: data management monitoring Data Job Parameters Monitoring
Introduction Goals: provide a simple and consistent way of preparing, organising and executing jobs on different computing infrastructures provide a clean interface which can be used: interactively (CLI / python interpreter) as a Python API for scripting through a GUI Make it easy and integrated with application environment Allow quick transition between local PC, cluster, Grid... Organize work, keep history of jobs,... EGEE - SEE Tutorial, Budapest
Motivation FULL RUN TEST DEBUG In practice users deal with multiple computing backends FULL RUN PANDA TEST PB S SG E LS F Local PC DEBUG EGEE - SEE Tutorial, Budapest
Motivation FAQ: running applications on multiple computing backends I must learn many interfaces PB S LocalP C LS F SG E PANDA How to configure my applications? Do I get a consistent view on all my jobs? EGEE - SEE Tutorial, Budapest 7
Ganga Ganga: Job Management Tool a utility which you download to your computer or it is already installed in your institute in a shared area for example: /nfs/sw/ganga/install/4.3.2 it is an add-on to installed software comes with a set of plugins for some applications open - other applications and backend may be easily added even by users Ganga Application Software LSF Client LCG UI GangaFramework .... Backend Plugins Application Plugins EGEE - SEE Tutorial, Budapest 8
Why not portal? Many (not all) scientific applications are: developed/modified/extended on local machine use local resources (files, environment) are scripted (e.g. higher-level logic is built around them) What user mobility (desktop vs laptop)? Ganga may use remote repository (and workspace) So user may see his jobs submitted from a different machine and interact with them (resubmit, kill)... Ganga approach: lean and neat utility with interface in Python + GUI if you like + scripting if you like more EGEE - SEE Tutorial, Budapest 9
User interfaces CLIP GUI GPI & Scripting *** Welcome to Ganga *** Version: Ganga-4-2-8 Documentation and support: http://cern.ch/ganga Type help() or help('index') for online help. In [1]: jobs Out[1]: Statistics: 1 jobs -------------- # id status name subjobs application backend backend.actualCE # 1 completed Executable LCG lcg-compute.hpc.unimelb.edu.au:2119/jobmanage CLIP GUI #!/usr/bin/env ganga #-*-python-*- import time j = Job() j.backend = LCG() j.submit() while not j.status in [‘completed’,’failed’]: print(‘job still running’) time.sleep(30) ./myjob.exec ganga ./myjob.exec In [1]:execfile(“myjob.exec”) GPI & Scripting We will focus on the Ganga CLIP (Command Line Interface for Python) EGEE - SEE Tutorial, Budapest
What Ganga can do for you help configure applications if you teach it about your application organize work job history: keep track of what user did save job outputs in a consistent way reuse configuration of previously submitted jobs you do not have to learn many interfaces one set of (simple) commands which are always the same localy and on the Grid Ganga will not hide anything from you you still have access to automatically generated jdls, job id etc. ganga will tell you what it does to submit jobs Works with no change on Grid production systems and GILDA, ... EGEE - SEE Tutorial, Budapest 11
GANGA User Communities More than 900 users use Ganga Garfield HARP EGEE - SEE Tutorial, Budapest
The Ganga development team Ganga is supported by HEP Support for development work Core team: F.Brochu (Cambridge), U.Egede (Imperial), J. Elmsheuser (Munich), K.Harrison (Cambridge), H.C.Lee (ASGC Taipei), D.Liko (CERN), A.Maier (CERN), J.T.Moscicki (CERN), A.Muraru (Bucharest), W.Reece (Imperial), A.Soroko (Oxford), CL.Tan (Birmingham) EGEE - SEE Tutorial, Budapest
Part I: Practical Ganga
Download, Install, First launch wget http://cern.ch/ganga/download/ganga-install python ganga-install \ --prefix=/usr/local/ganga/prefix \ --extern=GangaGUI,GangaPlotter \ 4.3.2 Download & Install download installer installation prefix Installation of external modules Ganga version export PATH=$HOME/opt/ganga/install/4.3.2/bin:$PATH $ ganga *** Welcome to Ganga *** Version: Ganga-4-3-2 Documentation and support: http://cern.ch/ganga Type help() or help('index') for online help. In [1]: Do you really want to exit ([y]/n)? First Launch start Ganga with inline configurations Ganga CLIP <ctrl>-D to exit Ganga CLIP EGEE - SEE Tutorial, Budapest
Where the Ganga journey starts … Ganga Job Where the Ganga journey starts … Mandatory Executable EGEE Optional The first thing to talk is about the Ganga job Ganga adopts the traditional way the scientists run their application: everything starts from “Job” But what’s different is that this time Ganga puts on top the physical job a layer of abstraction. The abstraction allows users to decided how their job should be executed. There are 6 building blocks of the job abstraction EGEE - SEE Tutorial, Budapest
Get your hands dirty ... submit and run a test job on local machine: Job().submit() submit and run a a test job on EGEE/LCG: Job(backend=LCG()).submit() browse the created jobs (job history): jobs get the first job from the job history: j = jobs[0] print the details of the job and see what you can set for a job: j make a copy of the job and submit the new job: j.copy().submit() see what you can do with the job: j.<tab> get interactive help: help EGEE - SEE Tutorial, Budapest
Python ConfigParser standard Configurations [Configuration] TextShell = IPython ... ... [LCG] EDG_ENABLE = True Syntax Python ConfigParser standard Hard-coded default configurations export GANGA_CONFIG_PATH=/usr/local/ganga/prefix/install/etc/Gilda.ini:GangaTutorial/Tutorial.ini ganga --config-path= =/usr/local/ganga/prefix/install/etc/Gilda.ini:GangaTutorial/Tutorial.ini ~/.gangarc ganga -o How to set configurations release config site config user config user config > site config > release config Override sequence EGEE - SEE Tutorial, Budapest
`gangadir` gangadir folder is created at the first launch within $HOME directory To locate it in different directory: [DefaultJobRepository] local_root = /alternative/gangadir/repository [FileWorkspace] topdir = /alternative/gangadir/workspace Job Repository may also be stored remotely in a database Metadata of jobs Data of jobs Possible to use the same Ganga instance to maintain multiple repositories (quite useful to separate project jobs) EGEE - SEE Tutorial, Budapest
Some handy functions <tab> completion <page up/down> for cmd history system command integration Job template In[1]: plugins() plugins(‘backends’) In[2]: help() etc. In[1]: j = jobs[1] In[2]: cat $j.outputdir/stdout Hello World In[1]: t = JobTemplate(name=’lcg_simple’) In[2]: t.backend = LCG(middleware=’EDG’) In[3]: templates Out[3]: Statistics: 1 templates -------------- # id status name subjobs application backend backend.actualCE # 3 template lcg_simple Executable LCG In[4]: j = Job(templates[3]) In[5]: j.submit() EGEE - SEE Tutorial, Budapest
Behind the scenes ... User has access to functionality of GANGA components through GUI, CLI or batch scripts storage and recovery of job information in a local or remote DB storage of sandbox files What happens when user types some job operation commands on the Ganga user interface. application configuration Data Management input data selection, output location specification job submission: selection between Grid and local system EGEE - SEE Tutorial, Budapest
What's next? NOT FOR THIS TUTORIAL Changes in Ganga 4.4.2 release jobs(id) instead of jobs[id] Ganga 5.0 many useful improvements and some interface changes EGEE - SEE Tutorial, Budapest
Part II: Ganga hands-on
Step 0: launch Ganga CLIP https://twiki.cern.ch/twiki/bin/view/ArdaGrid/EGEETutorialPackage Skip the installation step Start your Ganga CLIP session: shell> ganga EGEE - SEE Tutorial, Budapest
Step 1: Your first Ganga job - an arbitrary shell script In [1]: !pico myscript.sh In [2]: !chmod +x myscript.sh In [2]: j = Job() In [3]: j.application = Executable() In [4]: j.application.exe = File(‘myscript.sh’) In [5]: j.application.args = [‘Budapest’] In [6]: j.backend = Interactive() In [7]: j.submit() In [8]: jobs #!/bin/sh echo “Hello ${1} !” echo $HOSTNAME cat /proc/cpuinfo | grep 'model name’ cat /proc/meminfo | grep 'MemTotal‘ echo “Run on `date`” EGEE - SEE Tutorial, Budapest
Step 2: Your first Ganga job - an arbitrary shell script In [9]: j = j.copy() In [10]: j.backend = Local() In [11]: j.submit() In [12]: jobs In [13]: j.peek() In [14]:cat $j.outputdir/stdout ./myscript.sh Budapest EGEE - SEE Tutorial, Budapest
Step 3: your first Ganga job on the Grid In [15]:j = j.copy() In [16]:j.backend = LCG() In [17]:j.application.args = [‘Somewhere in the world...’] In [18]:j.submit() In [19]:j In [20]:cat $j.backend.loginfo(verbosity=1) In [21]:jobs EGEE - SEE Tutorial, Budapest
Exercise: Prime factorization Please follow the instructions from: https://twiki.cern.ch/twiki/bin/view/ArdaGrid/EGEETutorialPackage#Exercise_Prime_number_factorizat Table lookup EGEE - SEE Tutorial, Budapest
Part III: More
GANGA User Communities Garfield HARP EGEE - SEE Tutorial, Budapest
More than 900 unique Users ~ till June: 650 different users, ~120 users weekly the monitoring started end 2006 ~60% Atlas ~25% LHCb ~15% others - This plot shows the establishment of the Ganga user community. - ATLAS and LHCb users dominated the user community, the users coming from other communities are growing up - on average, we have roughly 40-50 unique users using Ganga everyday . Based on the trend, the number is growing. - Overall 300 unique users start using Ganga in recent 2 months EGEE - SEE Tutorial, Budapest
Ganga usage over 50 local sites CLIP and scripts most popular EGEE - SEE Tutorial, Budapest
Real Application: the ATLAS data analysis application $ ganga athena \ --inDS myInputDataset.txt\ --outputdata myOutput.root \ --split 3 \ --maxevt 100 \ --lsf \ jobOptions.py Scripting mode quick mix j = Job() j.application=Athena() j.application.prepare() j.application.option_file='jobOptions.py‘ j.inputdata=DQ2Dataset() j.inputdata.type='DQ2_LOCAL' j.inputdata.dataset=“myInputDataset.txt” j.outputdata=DQ2OutputDataset() j.outputdata.outputdata=[‘myOutput.root'] j.splitter = AthenaSplitterJob(numsubjobs=3) j.merger = AthenaOutputMerger() j.backend = LSF() j.submit() j2 = j.copy() j2.backend=LCG( CE=’ce102.cern.ch:2119/jobmanager-lcglsf-grid_2nh_atlas’ ) j2.submit() CLIP mode application inputdata flexible How a typical “real application” job looks like. Very easy to translate it to “scripting mode”, which immediately give users another way to run their applications outputdata Splitter & Merger EGEE - SEE Tutorial, Budapest
More than job submission: Monitoring & Accounting This is application level monitoring and accounting … how many jobs are submitted by whom, how many dataset has been analyzed … EGEE - SEE Tutorial, Budapest
Dashboard monitoring analysis and simulations jobs run in ATLAS one month (mid-March to mid-April 2007) 20K analysis jobs, 10K simulation jobs only LCG jobs, others not shown data collected by a ARDA Dashboard sensor integrated with Ganga EGEE - SEE Tutorial, Budapest
Integration with frameworks 200K short jobs in 12 hours (500CPUh) - Diane is an grid job optimization layer (application level scheduling, pull-mode job scheduling …) which has been used for application tasks should be done in a short deadline. The weakest part of DIANE is that it doesn’t have the information about the Grid jobs … it has just the working information … so for some statistic analysis, the DIANE worker has to be instrumented to collect the Grid job information … instead of doing that, we simply delegate the Grid job management work to Ganga since Ganga has implemented a quite nice mechanism to monitor and to keep track the progress of the Grid jobs. At the end, we could simply do some statistic analysis using Ganga and its job repository. parallelism on task level (beyond DAG) customized failure recovery semi-interactivity EGEE - SEE Tutorial, Budapest
Web portal for biologists Interface created by biologists (Model-View-Controller design pattern) The Model makes use of Ganga as a submission tool and DIANE to better handle docking jobs on the Grid The Controller organizes a set of actions to perform the virtual screening pipeline; The View represents biological aspects EGEE - SEE Tutorial, Budapest
specific backend binding Extending Ganga Splitter Merger Datasets Application plugin generic logic specific backend binding Detailed guide available at: https://twiki.cern.ch/twiki/bin/view/ArdaGrid/HowToPlugInNewApplicationType A good example: python/GangaTutorial/Lib/*.py EGEE - SEE Tutorial, Budapest
More info. Ganga Home: http://cern.ch/ganga Official Ganga User’s Guide: http://cern.ch/ganga/user/html/GangaIntroduction/ GangaTutorial GPI Reference Manual : http://ganga.web.cern.ch/ganga/release/4.3.2/reports/html/Manuals/GangaTutorialManual.html Looking for help: project-ganga-developers@cern.ch EGEE - SEE Tutorial, Budapest