Distrubuited Analysis using GANGA Farida Fassi CCIN2P3/CNRS , Lyon, France
Outline Ganga Overview Ganga Architecture How to use Ganga More on Ganga usage
Ganga Overview Ganga is a Gaudi/Athena and Grid Alliance project jointly developed by ATLAS and LHCb experiments Ganga is an easy-to-use front-end for job definition and management enabling a user to: Configure – Prepare – Monitor – Submit Ganga tries to answer the questions: How to minimize user’s effort in running applications?
Ganga Overview The naive idea of submitting jobs to Grid assume the following steps: Prepare the “Job Description Language” file for job configuration Find suitable Athena software application Locate the datasets on different storage elements Job splitting, monitoring and book-keeping Ganga combines the components to provide a front-end client for interacting with Grid infrastructures
Ganga 5
Ganga Overview Ganga allows simple switching between testing on a local batch system and large-scale data processing on Grid distributed resources Jobs look the same whether they run locally or on the Grid Configure once, run anywhere GANGA Local Batch Local Machine GRID LSF PBS EGEE OSG Nordugrid
Architecture Job Object is where the Ganga journey starts: A job in Ganga is constructed from a set of building blocks, not all required for every job Mandatory Optional The first thing to talk is about the Ganga job Ganga adopts the traditional way the scientists run their application: everything starts from “Job” But what’s different is that this time Ganga puts on top the physical job a layer of abstraction. The abstraction allows users to decided how their job should be executed. There are 6 building blocks of the job abstraction
Specific implementation Architecture Customized application, plug-in based design , eases job creation Incremental analysis development switching between different technologies: First test on local machine Intermediate sample analyzed on batch Full sample run using GRID backends Common interface Common part is taken care by normal object, specific part is based on “schema”, a python technology to expose specific implementation to it’s abstraction Specific implementation
User interfaces CLIP GUI GPI & Scripting *** Welcome to Ganga *** Version: Ganga-4-4-2 Documentation and support: http://cern.ch/ganga Type help() or help('index') for online help. In [1]: jobs Out[1]: Statistics: 1 jobs -------------- # id status name subjobs application backend backend.actualCE # 27 completed TestGroupArea-ific Athena LCG ce01.ific.uv.es:2119/jobmanager-pbs-short CLIP GUI #!/usr/bin/env ganga #-*-python-*- import time j = Job() j.backend = LCG() j.submit() while not j.status in [‘completed’,’failed’]: print(‘job still running’) time.sleep(30) ./myjob.exec ganga ./myjob.exec In [1]:execfile(“myjob.exec”) GPI & Scripting We will focus on the Ganga CLIP (Command Line Interface for Python)
How to use Ganga
Python ConfigParser standard Configurations [configuration] TextShell = IPython ... ... [LCG] VirtualOrganisation=atlas [athena] LCGOutputLocation = srm://lsrm.ific.uv.es/lustre/ific.uv.es/grid/atlas/dq2/users/ LocalOutputLocation = srm://lsrm.ific.uv.es/lustre/ific.uv.es/grid/atlas/dq2/users/ ATLAS_SOFTWARE = /opt/exp_software/atlas/prod/releases/rel_12-0_2 …. …. Syntax Python ConfigParser standard Hardcoded configurations setenv GANGA_CONFIG_PATH GangaAtlas/Atlas.ini set path = (/afs/ific.uv.es/project/atlas/software/ganga/install/4.4.2/bin/ $path) ~/.gangarc ganga -g How to set configurations release config site config user config user config > site config > release config Sequence
Configurations Ganga processes, in the order they are specified, any configuration files pointed to by the environment variable GANGA_CONFIG_PATH and then processes “.gangarc” configure file This makes possible the use of group configuration files But allows settings to be overridden by user config
Ganga Workspace Ganga creates a directory gangadir in your home directory and uses this for storing job-related files and information created at the first launch [DefaultJobRepository] local_root = /alternative/gangadir [ Metadata of jobs Data of jobs Possible to use the same Ganga instance to maintain multiple repositories (quite useful to separate project jobs)
“Hello World” example”: CLIP From a Ganga CLIP session, a job that writes “Hello World” can be created, and submitted to LCG, as follows app = Executeable() app.exe = “/bin/echo” app.env = {} app.args = [“Hello World”] # Property values set above are in fact the defaults # for Executable application j = Job(application = app, backend = LCG()) j.submit() # Check on job progress jobs # When job has completed, check the output j.peek(“stdout”)
ATLAS Analysis Job See Santi’s talk for ATLAS Analysis Model and Data Format ATLAS Applications: Athena and AthenaMC Data input: DQ2Dataset: all DQ2 dataset handling in client, LFC/SE interaction on worker node, used by all backends ATLASDataset: LFC file access ATLASLocalDataset: local file system, Local/Batch backend Data output: DQ2OutputDataset: stores files on Grid SE, registration in DQ2 AtlasOutputDataset: multipurpose for Grid and Local output
Athena example: CLIP This assumes you are in the ATLAS VO, your cmt area set up and have checked out, built your package into a work area: see Demo next j = Job() j.name='Test-AthenaJob-IFIC' j.application = Athena() j.application.exclude_from_user_area=["*.o","*.root.*","*.exe"] j.application.prepare(athena_compile=False) j.application.option_file='$HOME/AthenaTerstArea/12.0.6/PhysicsAnalysis/AnalysisCommon/UserAnalysis/UserAnalysis-00-09-10/run/AnalysisSkeleton_topOptions.py' j.application.atlas_release='12.0.6' j.inputdata.type='DQ2_LOCAL' j.application.max_events='10‘ j.inputdata=DQ2Dataset() j.inputdata.dataset="trig1_misal1_mc12.005186.PythiaZmumu_pt100_fixed.recon.AOD.v12000601_tid005906" j.splitter = AthenaSplitterJob(numsubjobs=2) j.merger = AthenaOutputMerger() j.outputdata=DQ2OutputDataset() j.outputdata.outputdata=['AnalysisSkeleton.aan.root'] j.backend=LCG() j.backend.CE='ce01.ific.uv.es:2119/jobmanager-pbs-short' j.submit() Aplication InputData How a typical “real application” job looks like. Very easy to translate it to “scripting mode”, which immediately give users another way to run their applications Splitter & Merger OutputData Submission
Ganga CLIP commands (1) Useful commands list_plugins( “type”) # List plugins of specified type: # “applications”, “backends”, etc j1 = Job(backend =LSF()) # Create a new job for LSF a1 = Executable() # Create Executable application j1.application = a1 # Set value for job’s application j1.backend = LCG() # Change job’s backend to LCG export(j1, “myJob.py”) # Write job to specified file load( “myJob.py” ) # Load job(s) from specified file j2 = j1.copy() # Create j2 as a copy of job j1 jobs # List jobs jobs[i].subjobs # List subjobs for split job i
Ganga CLIP commands (2) When a job j has been defined, the following methods can be used j.submit() # Submit the job j.kill() # Kill the job (if running) j.remove() # Kill the job and delete associated files j.peek() # List files in job’s output directory Once a job has been submitted, it can no longer be modified, it cannot be resubmitted, but the job can be copied and the copy can be modified/submitted
hands-on: using Ganga CLIP Will be based on the next URL https://twiki.ific.uv.es/twiki/bin/view/Atlas/AtlasTier2
Ganga beyond ATLAS and LHCb More about Ganga Ganga beyond ATLAS and LHCb
GANGA Activities Main Users Other activities Garfield HARP
Ganga Activities About 50 domains
More then 300 Ganga Users - This plot shows the establishment of the Ganga user community. - ATLAS and LHCb users dominated the user community, the users coming from other communities are growing up - on average, we have roughly 40-50 unique users using Ganga everyday . Based on the trend, the number is growing. - Overall 300 unique users start using Ganga in recent 2 months
More info. Ganga Home: http://cern.ch/ganga Official Ganga User’s Guide: http://ganga.web.cern.ch/ganga/user/html/GangaIntroduction/ Tutorial for ATLAS data analysis using Ganga: https://twiki.cern.ch/twiki/bin/view/Atlas/DistributedAnalysisUsingGanga Looking for helps: ATLAS user support: hn-atlasGANGAUserDeveloper@cern.ch direct support from developers: project-ganga-developers@cern.ch