ANALYSIS TOOLS FOR THE LHC EXPERIMENTS Dietrich Liko / CERN IT.

Slides:



Advertisements
Similar presentations
During the last three years, ALICE has used AliEn continuously. All the activities needed by the experiment (Monte Carlo productions, raw data registration,
Advertisements

1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Workload Management Massimo Sgaravatto INFN Padova.
Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',
1 Status of the ALICE CERN Analysis Facility Marco MEONI – CERN/ALICE Jan Fiete GROSSE-OETRINGHAUS - CERN /ALICE CHEP Prague.
Analysis demos from the experiments. Analysis demo session Introduction –General information and overview CMS demo (CRAB) –Georgia Karapostoli (Athens.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz (IT-ES) AliEn job agents.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
DOSAR Workshop, Sao Paulo, Brazil, September 16-17, 2005 LCG Tier 2 and DOSAR Pat Skubic OU.
Grid Workload Management Massimo Sgaravatto INFN Padova.
F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.
ARDA Prototypes Andrew Maier CERN. ARDA WorkshopAndrew Maier, CERN2 Overview ARDA in a nutshell –Experiments –Middleware Experiment prototypes (basic.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Summary Distributed Data Analysis Track F. Rademakers, S. Dasu, V. Innocente CHEP06 TIFR, Mumbai.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
INFSO-RI Enabling Grids for E-sciencE Experience of using gLite for analysis of ATLAS combined test beam data A. Zalite / PNPI.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
ATLAS Distributed Analysis Dietrich Liko. Thanks to … pathena/PANDA: T. Maneo, T. Wenaus, K. De DQ2 end user tools: T. Maneo GANGA Core: U. Edege, J.
A PanDA Backend for the Ganga Analysis Interface J. Elmsheuser 1, D. Liko 2, T. Maeno 3, P. Nilsson 4, D.C. Vanderster 5, T. Wenaus 3, R. Walker 1 1: Ludwig-Maximilians-Universität.
D. Adams, D. Liko, K...Harrison, C. L. Tan ATLAS ATLAS Distributed Analysis: Current roadmap David Adams – DIAL/PPDG/BNL Dietrich Liko – ARDA/EGEE/CERN.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 – The Ganga Evolution Andrew Maier.
ARDA Prototypes Julia Andreeva/CERN On behalf of the ARDA team CERN.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
JAliEn Java AliEn middleware A. Grigoras, C. Grigoras, M. Pedreira P Saiz, S. Schreiner ALICE Offline Week – June 2013.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
1. 2 Overview Extremely short summary of the physical part of the conference (I am not a physicist, will try my best) Overview of the Grid session focused.
The ATLAS Strategy for Distributed Analysis on several Grid Infrastructures D. Liko, IT/PSS for the ATLAS Distributed Analysis Community.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 Technical Overview Jakub T. Moscicki, CERN.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Gestion des jobs grille CMS and Alice Artem Trunov CMS and Alice support.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
DIRAC for Grid and Cloud Dr. Víctor Méndez Muñoz (for DIRAC Project) LHCb Tier 1 Liaison at PIC EGI User Community Board, October 31st, 2013.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
Seven things you should know about Ganga K. Harrison (University of Cambridge) Distributed Analysis Tutorial ATLAS Software & Computing Workshop, CERN,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Workload Management Workpackage
(Prague, March 2009) Andrey Y Shevel
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
LCG middleware and LHC experiments ARDA project
Support for ”interactive batch”
Presentation transcript:

ANALYSIS TOOLS FOR THE LHC EXPERIMENTS Dietrich Liko / CERN IT

Acknowledgements  I would like to thank my colleagues from the four experiments for their helpful discussions  I have added references to relevant presentations at the conference  I am trying to give an overview and I had to shorten some arguments  Possibly biased by my own opinions 2 Dietrich Liko 9/6/2007

Introduction  The experiments are planning their analysis according to their computing models  Data is being distributed to the Tier-1/Tier-2 centers for analysis  The experiments have developed tools that allow the users to analyze the data using resources on the grid  I want to take a close look at these tools  Is the grid already a tool for everybody ?  In addition there is also the need for more interactive analysis  Strong connection with the Analysis Framework and the Data Model chosen by the experiment 3 Dietrich Liko 9/6/2007

Transfer Tier-0 to Tier-1 Data is available in distributed manner Transfers to higher Tiers often still problematic Throughput MB/s M4 data taking by ATLAS 4 Dietrich Liko 9/6/2007

Reminder on Data Formats  RAW  Have to be recorded on permanent storage  Only a small fraction can be analyzed directly  Event Summary Data – ESD  Output of the reconstruction  Often large; difficult to analyze the full set  Analysis Object Data – AOD  Quantities in particular relevant for physics analysis  Main input for analysis, distributed to many sites  Analysis specific data  Sometimes called N-Tuples or DPD 5 Dietrich Liko 9/6/2007

CMS Analysis Environment  Support multiple analysis strategies  Let their users choose  Data Model needs to support direct ROOT access  No persistent/transient separation  Strategies  Full Framework – grid and local  FWLite – ROOT  TFWLiteSelector – ROOT and PROOF  ‘Bare’ ROOT # Dietrich Liko 9/6/2007

ATLAS Analysis Model  Basic principle: Smaller data can be read faster  Skimming- Keep interesting events  Thinning- Keep interesting objects in events  Slimming- Keep interesting info in objects  Reduction-Build higher-level data  Derived Physics Data  Share the schema with objects in the AOD/ESD  Can be analyzed interactively # 83 7 Dietrich Liko 9/6/2007

LHCb Analysis Model  Stripping is a centrally managed analysis run as a production  Reduces analysis datasets to a size of 10 6 to 10 7 events per year  All stripped data will be disk resident and replicated to all Tier-1 sites  LHCb distributed analysis jobs are run on the Grid # Dietrich Liko 9/6/2007

ALICE Analysis Model  All data in ROOT format  An emphasis is being put on being able to do the same analysis interactive and batch based  Also scheduled analysis an important issue  Organized analysis based on a train model 9 Dietrich Liko 9/6/2007

Batch vs Interactive  Need for batch type analysis tools and interactive analysis tools  Many groups the emphasis is having a solid batch type solution  Use the “traditional” grid infrastructures using their specific tools and applications  Need for more interactive tools is recognized  ROOT/PROOF is here the main activity and its driven by ALICE effort  To profit from this development often an evolution of the event model is required  Different groups put different emphasis on certain design criteria  For example transient/persistent separation 10 Dietrich Liko 9/6/2007

GRID Analysis Tools  CMS  CRAB together with gLite WMS and CondorG  LHCb  GANGA together with DIRAC  ATLAS  pathena/PANDA  GANGA together with the gLite WMS and ARC  Alice  Alien2, PROOF 11 Dietrich Liko 9/6/2007

CRAB Features  CMS Remote Analysis Builder  User oriented tool for grid submission and handling of analysis jobs  Support for WMS and CondorG  Command line oriented tool  Allows to create and submit jobs, query status and retrieve output # 176, Dietrich Liko 9/6/2007

CRAB Workflow 13 Dietrich Liko 9/6/2007

CRAB Usage from ARDA Dashboard Mid-July  mid-August 2007 – 645K jobs (20K jobs/day) – 89% grid success rate 14 Dietrich Liko 9/6/2007

CRAB Future  Evolution of the system to a client server architecture  Keeps the same interface  Aims to provide a “service” for the user  Reduce load on human  Improve scalability and reliability 15 Dietrich Liko 9/6/2007

LHCb  Use GANGA for Job Preparation and Submission  Tool being developed together with ATLAS  Discussed in the following together with ATLAS  Use DIRAC as Workload Management system  Use Pull model  Puts a layer on top of the EGEE infrastructure  Accounting, prioritization, fairshare  VO policy can be applied at a central location # Dietrich Liko #176 9/6/2007

DIRAC  DIRAC API provides a transparent and secure way to submit jobs  LCG jobs are the pilot jobs that pull jobs from a central task queue  Workarounds for many problems on the grid  Blackholes  Downtimes  Incorrect configurations  Prestaging of data 17 Dietrich Liko 9/6/2007

ATLAS Strategy  On the EGEE and the Nordugrid infrastructure ATLAS uses direct submission to the middleware using GANGA  EGEE: LCG RB and gLite WMS  Nordugrid: ARC middleware  On OSG PANDA system  Pilot based system  Also available at some EGEE sites 18 Dietrich Liko #287 #167 demo at exhibition 9/6/2007

GANGA Features  User friendly job submission tools  Extensible due to plugin system  Support for several applications  Athena, AthenaMC (ATLAS)  Gaudi, DaVinci (LHCb)  Others …  Support for several backends  LSF, PBS, SGE etc  gLite WMS, Nordugrid, Condor  DIRAC, PANDA (under development) 19 Dietrich Liko 9/6/2007

GANGA cont.  Supports several modes of working  Command line  IPython prompt  GUI 20 Dietrich Liko  Building blocks of a GANGA job 9/6/2007

For ATLAS  Applications  Athena (Analysis)  AthenaMC (User production)  Backends  Batch  LCG  Nordugrid  PANDA under development # Dietrich Liko 9/6/2007

GANGA Usage In total ~842 persons have tried GANGA this year (more then 500 in ATLAS) Per week ~ 275 users (for ATLAS ~ 150) 22 Dietrich Liko ATLAS LHCb Others 9/6/2007

PANDA Features  PANDA is the ATLAS production and analysis system on OSG  Under evaluation also in some EGEE clouds  For analysis pathena, a command line tool, is assembling the job  Status of the job can be monitored using Web pages  Large number of user sending many jobs # Dietrich Liko 9/6/2007

PANDA System 24 Dietrich Liko 9/6/2007

ATLAS future developments  For our users interoperability is evidently important  On the one hand  PANDA jobs on some EGEE sites  On the other hand  PANDA as a additional backend for GANGA  The positive aspect is that it gives ATLAS choices on how to evolve 25 Dietrich Liko 9/6/2007

Alice strategy  Batch analysis  Alien provides single entry point in the system  Interactive analysis  PROOF  Used for Alice at the CAF # Dietrich Liko # 444,307 9/6/2007

Alien2  GRID middleware  Developed as single entry point to the GRID for ALICE  Used also by other Vos  All the components necessary to build a GRID and interact with other GRIDs  File System with metadata  Authorization, authentication, job optimization and execution, storage management  Audit, quotas, monitoring  Interfaces to various GRID implementations  User interface integrated in the shell and into ROOT  Filecatalog as virtual file system  Taskqueue as virtual batch system  Used since:  2002 for centrally managed productions  2006 for user analysis 27 Dietrich Liko 9/6/2007

Alien2 – Job Execution Model 28 Jobs TaskQueue Job Broker Job Manager CE JA Central services Site services SplittingExpired PrioritiesMerging Zombies Job optimizers File catalogue LFN GUID Meta data SE Packman MonALISA JA CE SE Packman MonALISA FTD Site A Site B Dietrich Liko 9/6/2007

PROOF File catalog Master Scheduler Storage CPU’s Query PROOF query: data file list, mySelector.C Feedback, merged final output PROOF cluster Cluster perceived as extension of local PC Same macro and syntax as in local session More dynamic use of resources Real-time feedback Automatic splitting and merging 29 Dietrich Liko 9/6/2007

PROOF at CAF  Aim s to provide prompt and pilot analysis, calibration/alignment, fast simulation and reconstruction  Test setup in place since May 2006  40 “standard” machines, 2 CPUs each, 250 GB disk  2 Partitions: development (5 machines), production (35 machines)  The cluster is a xrootd pool  Disk cache  1 Machine: PROOF master and xrootd redirector  Other machines: PROOF workers and xrootd disk servers Access to local disks  Advantage of processing local data 30 Dietrich Liko 9/6/2007

User Experiences on the Grid  Many users have been exposed to the grid  Work is getting done  Simple user interface is essential to simplify the usage  But experts required to understand the problem  Sometimes user have the impression that they are debugging the grid 31 Dietrich Liko 9/6/2007

User Experiences  Role of the sites  Not only running the middleware  Data availability, software configurations  Often significant investment required by the site  User support  First line support by experts in the experiments Provide fast feedback to users, debugging of problems  Identify application problems Feedback to the developers inside the experiment  Identify grid problems Feedback to GGUS 32 Dietrich Liko 9/6/2007

Conclusions  We have seen an increasing number of physicist using the grid  Hundreds of users are using the various tools  Large number of jobs are being submitted  Often the grid is still complex  Data distribution and storage is a very important issue  Strong user support is required to address the complications  Rising interest in interactive analysis  Leads to evolutions of the Data Format  Slightly different strategies by the experiments 33 Dietrich Liko 9/6/2007