Introduction to Distributed Analysis

Introduction to Distributed Analysis
Dietrich Liko

Overview Introduction to Grid Computing Three grid flavors in ATLAS
EGEE OSG Nordugrid Distributed Analysis Activities GANGA/LCG PANDA/OSG Other tools How to find your data ? Where is the data stored Which data is really available ?

Evolution of CERN computing
2 years to build 3 months to install 320 kBytes storage Less computing power than today’s calculators 1958: Ferranti Mercury 1967: CDC 6400 1976: IBM 370/168 The scope and complexity of particle-physics experiments has increased in parallel with increases in computing power Massive upsurge in computing requirements in going from LEP to LHC 1988: IBM MM 3090, DEC VAX, Cray X-MP 2001: PC Farm

Strategy for processing LHC data
Majority of data processing (reconstruction/simulation/analysis) for LEP experiments performed at CERN About 50% of physics analyses run at collaborating institutes Similar approach might have been possible for LHC Increase data-processing capacity at CERN Take advantage of Moore’s Law increase in CPU power and storage LHC Computing Review (CERN/LHCC/ ) discouraged LEP-type approach Rules out access to funding not available to CERN Makes poor use of expertise and resources at collaborating institutes Require solution for managing distributed data and CPUs: Grid computing  Project for LHC Computing Grid (LCG) started 2002

Grid Computing Ideas behind Grid computing have been around since the 1970s, but became very fashionable around the turn of the century A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities. Ian Foster and Carl Kesselman, The Grid: Blueprint for a New Computing Infrastructure (1998) First release of Globus Toolkit for Grid infrastructures made in 1998 World Wide Web commercially attractive by late 1990s e-Everything suddenly in vogue: , e-Commerce, e-Science Dot-com bubble Grid proposed as evolution of World Wide Web: access to resources as well as to information Many projects: EGEE, OSG, Nordugrid GridPP, INFN Grid, D-Grid

Distributed Analysis Data Analysis User Production AOD & ESD analysis
TAG based analysis pathena/PANDA GANGA/LCG User Production Prodsys LJSF GANGA (DQ2 Integration)

EGEE Job submission via LCG Resource Broker LFC File catalog
The new gLite RB is on its way … LFC File catalog Also CondorG submission is possible Requires some expertise and has no support from the service provider New approach using Condor glideins is under investigation (Cronus)

Resource Broker Model CE RB CE RB CE

PANDA is an integrated production and distributed analysis system
OSG/Panda PANDA is an integrated production and distributed analysis system Pilot job based Similar to DIRAC & Alien Simple File Catalogs at sites Will be supported by GANGA in release 4.3

Three grids …. ATLAS is using three large infrastructures
EGEE OSG Nordugroid Grids have different middleware Different software to submit jobs Different catalogs to store the data We have to aim to hide this differences from the ATLAS user

Panda Model CE Task queue CE CE

Nordugrid ARC middleware for job submission RLS Filecatalog
Powerful and simple RLS Filecatalog Will be supported by GANGA in release 4.3

ARC Model CE CE CE

How can we live with that ?
Data management layer to hide this differences – Don Quixote 2 Tools that aim to hide the difficulties to submit jobs pathena/PANDA on OSG GANGA on LCG In the future better interoperability On level of the ATLAS tools On the level of the middleware

pathena/PANDA Lightweight client Integrated to Athena release
Very nice work A lot of work has been done to support better user jobs Short queues, multitasking pilots etc. A large set of data is available Available since some time

GANGA/LCG Text UI & GUI Multiple backends
A pathena-like interface is available Multiple backends LCG/EGEE LSF – works also with CAT queues PBS PANDA & Nordugrid for 4.3 And others

Dashboard Monitoring We are setting up a framework to monitor distributed analysis jobs MonaLisa based (OSG, LCG) RGMA Imperial collage DB Production system GANGA has been instrumented to understand its usage

Since September 1st …

Dataset distribution In principle data should be everywhere
AOD & ESD during this year ~ 30 TB max Three steps Not all data can be consolidated Other grids, Tier-2 Distribution between Tier-1 not yet perfect Distribution to Tier-2’s can only be the next step

Latest number by Alexei – Feb 27
Files req/copied Transfers Waiting(*) Transfered in 7 days ASGC BNL CERN CNAF FZK LYON NDGF NIKHEF PIC RAL TRIUMF The milage is varying between 33.6 % to 98.2

Monitoring of transfers

Why can I not send the jobs to the data automatically ?
I will advise you to send jobs to selected sites This is not the final word, this is just a way to address the current situation ATLAS is using a dataset concept Datasets have a content Datasets have one or more locations Datasets can be complete or incomplete at a location Only complete datasets can be used in a dataset based brokering process We are currently trying to understand How much data is available as complete datasets Can we do a file based brokering for incomplete datasets ? We have big progress in the last months, but not yet all is working as we would like

How to find out which data exists
AMI Metadata Prodsys database Dataset browser

How to access data ? Download with dq2_get, analyze locally
Works (sometimes), is not scalable Data is distributed on sites, jobs are send to sites to analyze the data DA is promoting this way of working The process of finding the data will be fully automated in some time

Posix like IO DA wants to read data directly from the SE
Prodsys is downloading the data using gridftp Use rfio, dcap, GFAL, xrootd We want to use posix like IO Size of the local disk for the job We do not need the full event We do not need all events As of today ATLAS AOD jobs read data with ~ 2 MB/sec

Analysis jobs Today on job 1 year of running of LHC Backnavigation
10 to 100 AOD files, 130 MB each 1 year of running of LHC 150 TB AOD according to ATLAS computing model Filesize 10 GB Still order of files Backnavigation Reduces IO Increase load on SE do to more “open”

Some measurements 10 files a 130 MB Local: 14:02 min
Standard Analysis Example Local: :02 min DPM using rfio: 16:30 min Castor-2: :29 min 150 TB: about 1000 days

DPM in Glasglow

Athena jobs Athena uses POOL/ROOT
Many issues concerning plugins and current configuration See Wiki page

Highlights dCache DPM Castor Some issues will go away with v13
Wrong dCache library (except BNL) DPM Need to provide a symbolic link (libdpm.so -> libshift.so) Broken RFIO plugin DPM URLs not support Castor New castor sytntax not supported No files larger the 2BG Some issues will go away with v13 RFIO plugin will still be outdated New rfio library not yet released We need to do systematic test Proposed by Stephane

Backporting ROOT RFIO plugin
Advantages New syntax a la Castor-2 Large Files > 2GB Problems with DPM A different URL format Some problems querying the file attributes Several patches required to make in work Security context required, but Grid UI clashes since last week with Athena due to python version New RFIO plugin is under development inside ROOT In generally new ROOT IO plugins should be backported to agreed ROOT versions

Short queues Distributed Analysis competes with Production
Short queues can be used to speed up the analysis There is a lot of discussion going on how useful short queues are Empirically I prefer to send jobs to short queues Selection of queues is the easy part, selecting the dataset location is the complicated aspect Fully automatic for complete datasets

Summary We are learning how to access data everywhere
Several tools are available to perform Distributed Analysis Integrated with DQ2 Data is being collected and also distributed Still a lot of work in front of us We are learning how to access data everywhere How to find data How to read data Not fully automatic yet But we aim for that We learn how to handle user jobs Job Priorities on LCG Short Queues

Next steps Increase the number of sites Interoperability
We have to push getting the data at all Tier-1. They are the backbone of the ATLAS data distribution Interoperability Is for sure be an issue for this year GANGA will send jobs to other sites PANDA will run on LCG Cronus wants to bridge all resources

GANGA Introduction

Who is ATLAS GANGA ? GANGA Core
Ulrik Egede, Karl Harrison, Jakub.Moscicki, A.Soroko, V.Romanovsky, Adrina Murao GANGA GUI Chun Lik Tan Athena AOD analysis Johannes Elmsheuser Tag Navigator Mike Kenyon, Caitherina Nicholson User production Fredric Brochu EGEE/LCG Hurng-Chun Lee, Dietrich Liko Nordugrid Pajchel Katarina, Bjoern Hallvard PANDA Dietrich Liko + support from PANDA Cronus Rod Walker AMI Integration Farida Fassi, Chun Lik Tan + support from AMI Mona Lisa Montoring Benjamin Gaidioz, Jae Yu, Tummalapalli Reddy

What is GANGA ? Ganga is an easy-to-use frontend for job definition and management Allows simple switching between testing on a local batch system and large-scale data processing on distributed resources (Grid) Developed in the context of ATLAS and LHC. For ATLAS Athena framework JobTransformations DQ2 data-management system EGEE/LCG For release 4.3 AMI PANDA/OSG Nordugrid Cronus Component architecture readily allows extension Implemented in Python

Domains

GANGA Job Abstraction What to run Application Where to run Backend
Data read by application Input Dataset Job Data written by application Output Dataset Rule for dividing into subjobs Splitter Rule for combining outputs Merger

Framework for plugins GangaObject IApplication ISplitter IDataset
Interfaces Plugin IApplication ISplitter IDataset IMerger IBackend Athena atlas_release max_events options option_file user_setupfile user_area CE requirements jobtype middleware id status reason actualCE exitcode LCG User Example plugins and schemas System

Backends and Applications
Gauss/Boole/Brunel/DaVinci (Simulation/Digitisation/ Reconstruction/Analysis) AthenaMC (Production) Athena (Simulation/Digitisation/ Reconstruction/Analysis) Executable PBS LSF OSG PANDA LHCb WMS US-ATLAS WMS Implemented Coming soon

Status Actual version: 4.2.11 Upcoming version 4.3 AOD analysis
TAG based analysis Mona Lisa based Monitoring LCG/EGEE Batch handlers Upcoming version 4.3 Tag Navigator AMI Integration PANDA Nordugrid Cronus

How elements work together ?
Ganga has built-in support for ATLAS and LHCb Component architecture allows customisation for other user groups LHCb applications ATLAS Other Applications Metadata catalogues Data storage and retrieval File Tools for data management GANGA User interface for job definition and management Local repository Remote repository Ganga job archives Ganga monitoring loop Experiment-specific workload-management systems Local batch systems Distributed (Grid) systems Processing systems (backends)

Different working styles
Command Line Interface in Python (CLIP) provides interactive job definition and submission from an enhanced Python shell (IPython) Especially good for trying things out, and seeing how the system works Scripts, which may contain any Python/IPython or CLIP commands, allow automation of repetitive tasks Scripts included in distribution enable kind of approach traditionally used when submitting jobs to a local batch system Graphical User Interface (GUI) allows job management based on mouse selections and field completion Lots of configuration possibilities

Scripts provide pathena like interface
ganga athena --inDS trig1_misal1_csc Jimmy_jetsJ4.recon.AOD.v outputdata AnalysisSkeleton.aan.root --split 3 --maxevt 100 --lcg --ce ce102.cern.ch:2119/jobmanager-lcglsf-grid_2nh_atlas AnalysisSkeleton_topOptions.py Monitoring the job status for example using GUI or CLI

IPython IPython CLIP How to define a job j=Job()
Comfortable python shell Many useful extensions CLIP GANGA Command line interface How to define a job j=Job() j.application=Executable() j.application.exe=‘/bin/echo’ j.applications.args=[‘Hello World’] j.backend=LCG() j.submit() Other commands jobs Jobs[20].kill() jobs[20].copy()

Exercises Subset adapted for today
Current Tutorial that explains more features FAQ User Support using hypernews

Introduction to Distributed Analysis

Similar presentations

Presentation on theme: "Introduction to Distributed Analysis"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Distributed Analysis

Similar presentations

Presentation on theme: "Introduction to Distributed Analysis"— Presentation transcript:

Similar presentations

About project

Feedback