Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jakub T. Moscicki (KUBA) CERN

Similar presentations


Presentation on theme: "Jakub T. Moscicki (KUBA) CERN"— Presentation transcript:

1 Jakub T. Moscicki (KUBA) CERN
Ganga Tutorial Jakub T. Moscicki (KUBA) CERN

2 Agenda Part I: Ganga introduction Part II: Ganga hands-on
Part III: More about Ganga EGEE - SEE Tutorial, Budapest

3 Part I: Ganga Overview

4 The Ganga Project DIRAC Batch Grids Panda
“configure once, run anywhere” user applications resources GUI Ganga Interface Localhost Batch Condor Grids DIRAC Panda cmd line shell scripting jobs Interface helping users to configure applications and submit jobs to the Grid The primary focus is on the Atlas and LHCb applications. The project was initiated by HEP collaborations because they appreciated the need of a consistent and easy to use environment which would allow users to focus on their tasks and hide all technicalities of the underlying systems. Ganga is an open system which interfaces to generic resources and middleware (Batch FARMS, Grid MIDDLEWARE, Condor...) as well as the experiment specific systems (DIRAC, Panda). user may move between different resources and systems easily tool is adapted to the needs of the users and their different working styles and habits: GUI shell scripts Ganga provides access to auxiliary services such as: data management monitoring Data Job Parameters Monitoring

5 Introduction Goals: provide a simple and consistent way of preparing, organising and executing jobs on different computing infrastructures provide a clean interface which can be used: interactively (CLI / python interpreter) as a Python API for scripting through a GUI Make it easy and integrated with application environment Allow quick transition between local PC, cluster, Grid... Organize work, keep history of jobs,... EGEE - SEE Tutorial, Budapest

6 Motivation FULL RUN TEST DEBUG
In practice users deal with multiple computing backends FULL RUN PANDA TEST PB S SG E LS F Local PC DEBUG EGEE - SEE Tutorial, Budapest

7 Motivation FAQ: running applications on multiple computing backends
I must learn many interfaces PB S LocalP C LS F SG E PANDA How to configure my applications? Do I get a consistent view on all my jobs? EGEE - SEE Tutorial, Budapest 7

8 Ganga Ganga: Job Management Tool
a utility which you download to your computer or it is already installed in your institute in a shared area for example: /nfs/sw/ganga/install/4.3.2 it is an add-on to installed software comes with a set of plugins for some applications open - other applications and backend may be easily added even by users Ganga Application Software LSF Client LCG UI GangaFramework .... Backend Plugins Application Plugins EGEE - SEE Tutorial, Budapest 8

9 Why not portal? Many (not all) scientific applications are:
developed/modified/extended on local machine use local resources (files, environment) are scripted (e.g. higher-level logic is built around them) What user mobility (desktop vs laptop)? Ganga may use remote repository (and workspace) So user may see his jobs submitted from a different machine and interact with them (resubmit, kill)... Ganga approach: lean and neat utility with interface in Python + GUI if you like + scripting if you like more EGEE - SEE Tutorial, Budapest 9

10 User interfaces CLIP GUI GPI & Scripting
*** Welcome to Ganga *** Version: Ganga-4-2-8 Documentation and support: Type help() or help('index') for online help. In [1]: jobs Out[1]: Statistics: 1 jobs # id status name subjobs application backend backend.actualCE # completed Executable LCG lcg-compute.hpc.unimelb.edu.au:2119/jobmanage CLIP GUI #!/usr/bin/env ganga #-*-python-*- import time j = Job() j.backend = LCG() j.submit() while not j.status in [‘completed’,’failed’]: print(‘job still running’) time.sleep(30) ./myjob.exec ganga ./myjob.exec In [1]:execfile(“myjob.exec”) GPI & Scripting We will focus on the Ganga CLIP (Command Line Interface for Python) EGEE - SEE Tutorial, Budapest

11 What Ganga can do for you
help configure applications if you teach it about your application organize work job history: keep track of what user did save job outputs in a consistent way reuse configuration of previously submitted jobs you do not have to learn many interfaces one set of (simple) commands which are always the same localy and on the Grid Ganga will not hide anything from you you still have access to automatically generated jdls, job id etc. ganga will tell you what it does to submit jobs Works with no change on Grid production systems and GILDA, ... EGEE - SEE Tutorial, Budapest 11

12 GANGA User Communities
More than 900 users use Ganga Garfield HARP EGEE - SEE Tutorial, Budapest

13 The Ganga development team
Ganga is supported by HEP Support for development work Core team: F.Brochu (Cambridge), U.Egede (Imperial), J. Elmsheuser (Munich), K.Harrison (Cambridge), H.C.Lee (ASGC Taipei), D.Liko (CERN), A.Maier (CERN), J.T.Moscicki (CERN), A.Muraru (Bucharest), W.Reece (Imperial), A.Soroko (Oxford), CL.Tan (Birmingham) EGEE - SEE Tutorial, Budapest

14 Part I: Practical Ganga

15 Download, Install, First launch
wget python ganga-install \ --prefix=/usr/local/ganga/prefix \ --extern=GangaGUI,GangaPlotter \ 4.3.2 Download & Install download installer installation prefix Installation of external modules Ganga version export PATH=$HOME/opt/ganga/install/4.3.2/bin:$PATH $ ganga *** Welcome to Ganga *** Version: Ganga-4-3-2 Documentation and support: Type help() or help('index') for online help. In [1]: Do you really want to exit ([y]/n)? First Launch start Ganga with inline configurations Ganga CLIP <ctrl>-D to exit Ganga CLIP EGEE - SEE Tutorial, Budapest

16 Where the Ganga journey starts …
Ganga Job Where the Ganga journey starts … Mandatory Executable EGEE Optional The first thing to talk is about the Ganga job Ganga adopts the traditional way the scientists run their application: everything starts from “Job” But what’s different is that this time Ganga puts on top the physical job a layer of abstraction. The abstraction allows users to decided how their job should be executed. There are 6 building blocks of the job abstraction EGEE - SEE Tutorial, Budapest

17 Get your hands dirty ... submit and run a test job on local machine: Job().submit() submit and run a a test job on EGEE/LCG: Job(backend=LCG()).submit() browse the created jobs (job history): jobs get the first job from the job history: j = jobs[0] print the details of the job and see what you can set for a job: j make a copy of the job and submit the new job: j.copy().submit() see what you can do with the job: j.<tab> get interactive help: help EGEE - SEE Tutorial, Budapest

18 Python ConfigParser standard
Configurations [Configuration] TextShell = IPython [LCG] EDG_ENABLE = True Syntax Python ConfigParser standard Hard-coded default configurations export GANGA_CONFIG_PATH=/usr/local/ganga/prefix/install/etc/Gilda.ini:GangaTutorial/Tutorial.ini ganga --config-path= =/usr/local/ganga/prefix/install/etc/Gilda.ini:GangaTutorial/Tutorial.ini ~/.gangarc ganga -o How to set configurations release config site config user config user config > site config > release config Override sequence EGEE - SEE Tutorial, Budapest

19 `gangadir` gangadir folder is created at the first launch within $HOME directory To locate it in different directory: [DefaultJobRepository] local_root = /alternative/gangadir/repository [FileWorkspace] topdir = /alternative/gangadir/workspace Job Repository may also be stored remotely in a database Metadata of jobs Data of jobs Possible to use the same Ganga instance to maintain multiple repositories (quite useful to separate project jobs) EGEE - SEE Tutorial, Budapest

20 Some handy functions <tab> completion
<page up/down> for cmd history system command integration Job template In[1]: plugins() plugins(‘backends’) In[2]: help() etc. In[1]: j = jobs[1] In[2]: cat $j.outputdir/stdout Hello World In[1]: t = JobTemplate(name=’lcg_simple’) In[2]: t.backend = LCG(middleware=’EDG’) In[3]: templates Out[3]: Statistics: 1 templates # id status name subjobs application backend backend.actualCE # template lcg_simple Executable LCG In[4]: j = Job(templates[3]) In[5]: j.submit() EGEE - SEE Tutorial, Budapest

21 Behind the scenes ... User has access to functionality of GANGA
components through GUI, CLI or batch scripts storage and recovery of job information in a local or remote DB storage of sandbox files What happens when user types some job operation commands on the Ganga user interface. application configuration Data Management input data selection, output location specification job submission: selection between Grid and local system EGEE - SEE Tutorial, Budapest

22 What's next? NOT FOR THIS TUTORIAL Changes in Ganga 4.4.2 release
jobs(id) instead of jobs[id] Ganga 5.0 many useful improvements and some interface changes EGEE - SEE Tutorial, Budapest

23 Part II: Ganga hands-on

24 Step 0: launch Ganga CLIP
Skip the installation step Start your Ganga CLIP session: shell> ganga EGEE - SEE Tutorial, Budapest

25 Step 1: Your first Ganga job - an arbitrary shell script
In [1]: !pico myscript.sh In [2]: !chmod +x myscript.sh In [2]: j = Job() In [3]: j.application = Executable() In [4]: j.application.exe = File(‘myscript.sh’) In [5]: j.application.args = [‘Budapest’] In [6]: j.backend = Interactive() In [7]: j.submit() In [8]: jobs #!/bin/sh echo “Hello ${1} !” echo $HOSTNAME cat /proc/cpuinfo | grep 'model name’ cat /proc/meminfo | grep 'MemTotal‘ echo “Run on `date`” EGEE - SEE Tutorial, Budapest

26 Step 2: Your first Ganga job - an arbitrary shell script
In [9]: j = j.copy() In [10]: j.backend = Local() In [11]: j.submit() In [12]: jobs In [13]: j.peek() In [14]:cat $j.outputdir/stdout ./myscript.sh Budapest EGEE - SEE Tutorial, Budapest

27 Step 3: your first Ganga job on the Grid
In [15]:j = j.copy() In [16]:j.backend = LCG() In [17]:j.application.args = [‘Somewhere in the world...’] In [18]:j.submit() In [19]:j In [20]:cat $j.backend.loginfo(verbosity=1) In [21]:jobs EGEE - SEE Tutorial, Budapest

28 Exercise: Prime factorization
Please follow the instructions from: Table lookup EGEE - SEE Tutorial, Budapest

29 Part III: More

30 GANGA User Communities
Garfield HARP EGEE - SEE Tutorial, Budapest

31 More than 900 unique Users ~ till June: 650 different users, ~120 users weekly the monitoring started end 2006 ~60% Atlas ~25% LHCb ~15% others - This plot shows the establishment of the Ganga user community. - ATLAS and LHCb users dominated the user community, the users coming from other communities are growing up - on average, we have roughly unique users using Ganga everyday . Based on the trend, the number is growing. - Overall 300 unique users start using Ganga in recent 2 months EGEE - SEE Tutorial, Budapest

32 Ganga usage over 50 local sites CLIP and scripts most popular
EGEE - SEE Tutorial, Budapest

33 Real Application: the ATLAS data analysis application
$ ganga athena \ --inDS myInputDataset.txt\ --outputdata myOutput.root \ --split 3 \ --maxevt 100 \ --lsf \ jobOptions.py Scripting mode quick mix j = Job() j.application=Athena() j.application.prepare() j.application.option_file='jobOptions.py‘ j.inputdata=DQ2Dataset() j.inputdata.type='DQ2_LOCAL' j.inputdata.dataset=“myInputDataset.txt” j.outputdata=DQ2OutputDataset() j.outputdata.outputdata=[‘myOutput.root'] j.splitter = AthenaSplitterJob(numsubjobs=3) j.merger = AthenaOutputMerger() j.backend = LSF() j.submit() j2 = j.copy() j2.backend=LCG( CE=’ce102.cern.ch:2119/jobmanager-lcglsf-grid_2nh_atlas’ ) j2.submit() CLIP mode application inputdata flexible How a typical “real application” job looks like. Very easy to translate it to “scripting mode”, which immediately give users another way to run their applications outputdata Splitter & Merger EGEE - SEE Tutorial, Budapest

34 More than job submission: Monitoring & Accounting
This is application level monitoring and accounting … how many jobs are submitted by whom, how many dataset has been analyzed … EGEE - SEE Tutorial, Budapest

35 Dashboard monitoring analysis and simulations jobs run in ATLAS
one month (mid-March to mid-April 2007) 20K analysis jobs, 10K simulation jobs only LCG jobs, others not shown data collected by a ARDA Dashboard sensor integrated with Ganga EGEE - SEE Tutorial, Budapest

36 Integration with frameworks
200K short jobs in 12 hours (500CPUh) - Diane is an grid job optimization layer (application level scheduling, pull-mode job scheduling …) which has been used for application tasks should be done in a short deadline. The weakest part of DIANE is that it doesn’t have the information about the Grid jobs … it has just the working information … so for some statistic analysis, the DIANE worker has to be instrumented to collect the Grid job information … instead of doing that, we simply delegate the Grid job management work to Ganga since Ganga has implemented a quite nice mechanism to monitor and to keep track the progress of the Grid jobs. At the end, we could simply do some statistic analysis using Ganga and its job repository. parallelism on task level (beyond DAG) customized failure recovery semi-interactivity EGEE - SEE Tutorial, Budapest

37 Web portal for biologists
Interface created by biologists (Model-View-Controller design pattern) The Model makes use of Ganga as a submission tool and DIANE to better handle docking jobs on the Grid The Controller organizes a set of actions to perform the virtual screening pipeline; The View represents biological aspects EGEE - SEE Tutorial, Budapest

38 specific backend binding
Extending Ganga Splitter Merger Datasets Application plugin generic logic specific backend binding Detailed guide available at: A good example: python/GangaTutorial/Lib/*.py EGEE - SEE Tutorial, Budapest

39 More info. Ganga Home: http://cern.ch/ganga
Official Ganga User’s Guide: GangaTutorial GPI Reference Manual : Looking for help: EGEE - SEE Tutorial, Budapest


Download ppt "Jakub T. Moscicki (KUBA) CERN"

Similar presentations


Ads by Google