Wouter Verkerke, NIKHEF Status of Software & Computing Wouter Verkerke.

Slides:

Advertisements

Similar presentations

31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.

Advertisements

Computing Lectures Introduction to Ganga 1 Ganga: Introduction Object Orientated Interactive Job Submission System –Written in python –Based on the concept.

David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

Ganga Developments Karl Harrison (University of Cambridge) 18th GridPP Meeting University of Glasgow, 20th-21st March 2007

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.

Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.

David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.

Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.

EGEE is a project funded by the European Union under contract IST Testing processes Leanne Guy Testing activity manager JRA1 All hands meeting,

Nick Brook Current status Future Collaboration Plans Future UK plans.

The first year of LHC physics analysis using the GRID: Prospects from ATLAS Davide Costanzo University of Sheffield

David Adams ATLAS DIAL status David Adams BNL July 16, 2003 ATLAS GRID meeting CERN.

ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.

F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,

Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.

November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.

CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.

Introduction to Ganga Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007

DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.

1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.

Analysis trains – Status & experience from operation Mihaela Gheata.

ATLAS Distributed Analysis Dietrich Liko. Thanks to … pathena/PANDA: T. Maneo, T. Wenaus, K. De DQ2 end user tools: T. Maneo GANGA Core: U. Edege, J.

A PanDA Backend for the Ganga Analysis Interface J. Elmsheuser 1, D. Liko 2, T. Maeno 3, P. Nilsson 4, D.C. Vanderster 5, T. Wenaus 3, R. Walker 1 1: Ludwig-Maximilians-Universität.

David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.

PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Yerevan Physics Institute, CERN.

Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,

INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.

University user perspectives of the ideal computing environment and SLAC’s role Bill Lockman Outline: View of the ideal computing environment ATLAS Computing.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Ganga User Interface EGEE Review Jakub Moscicki.

Using Ganga for physics analysis Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007

2 June 20061/17 Getting started with Ganga K.Harrison University of Cambridge Tutorial on Distributed Analysis with Ganga CERN, 2.

The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.

ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.

TAGS in the Analysis Model Jack Cranshaw, Argonne National Lab September 10, 2009.

INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.

LHCbDirac and Core Software. LHCbDirac and Core SW Core Software workshop, PhC2 Running Gaudi Applications on the Grid m Application deployment o CVMFS.

Victoria, Sept WLCG Collaboration Workshop1 ATLAS Dress Rehersals Kors Bos NIKHEF, Amsterdam.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.

The ATLAS Computing & Analysis Model Roger Jones Lancaster University ATLAS UK 06 IPPP, 20/9/2006.

Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.

K. Harrison CERN, 21st February 2005 GANGA: ADA USER INTERFACE - Ganga release Python client for ADA - ADA job builder - Ganga release Conclusions.

David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.

Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.

David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.

INFSO-RI Enabling Grids for E-sciencE Ganga 4 Technical Overview Jakub T. Moscicki, CERN.

A GANGA tutorial Professor Roger W.L. Jones Lancaster University.

David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.

WLCG November Plan for shutdown and 2009 data-taking Kors Bos.

ATLAS Physics Analysis Framework James R. Catmore Lancaster University.

Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.

Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.

Seven things you should know about Ganga K. Harrison (University of Cambridge) Distributed Analysis Tutorial ATLAS Software & Computing Workshop, CERN,

ATLAS Distributed Computing Tutorial Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow.

User view Ganga classes and functions can be used interactively at a Python prompt, can be referenced in scripts, or can be used indirectly via a Graphical.

David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL May 19, 2003 BNL Technology Meeting.

ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.

ATLAS Distributed Analysis S. González de la Hoz 1, D. Liko 2, L. March 1 1 IFIC – Valencia 2 CERN.

Wouter Verkerke, NIKHEF Status of DC3 (CSC) production Wouter Verkerke (NIKHEF)

Database Replication and Monitoring

ALICE analysis preservation

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

Readiness of ATLAS Computing - A personal view

Artem Trunov and EKP team EPK – Uni Karlsruhe

ATLAS DC2 & Continuous production

The ATLAS Computing Model

Presentation transcript:

Wouter Verkerke, NIKHEF Status of Software & Computing Wouter Verkerke

Wouter Verkerke, NIKHEF Software context: CSC notes ATLAS is currently producing new samples for CSC notes –Last major physics production round was for 2005 Rome workshop (simulation in release 9, reconstruction in release 10) Plan for CSC notes –Event generation in and –Simulation in release –Reconstruction in release  –Some samples new reconstruction in release 13 (e.g. H  gg needs improved egamma reco with brem recovery that will only be available in release 13) Major new things –New physics samples –Increased level of realism in simulation –Improved reconstruction and full trigger information for trigger-aware analysis

Wouter Verkerke, NIKHEF Physics Coordination 17/01/07 – Status of software Validation of for simulation production completed –Bulk simulation underway –Initial problems reading EvGen data in (effectively a forward compatibility requirement). Needed because contains relevant updates of MC generators (e.g. fix in AlpGen MLM matching) –Fixed in (8), which can read both EvGen and EvGen data –New: another fix (9) was required as (8) did not work on all OSG sites Processing of validation sample for reconstruction did not go very smoothly –Release (1) built around time of last PhysCoord meeting –Very large number of tags went in last minute to make reco,trigger work –Managed to build patch (2) with fixes into production just before holiday shutdown –Release (2) is the first patch in which code was patched in addition to scripts. Used this mechanism to fix (among others) Muon software problem

Wouter Verkerke, NIKHEF Physics Coordination 17/01/07 – Status of software –Note that strategy of code patching has important implications: Official production output will be different from private running over kit or AFS release if patch is not explicitly applied. This presents issues for people doing private production. Will not go in details now because point has become moot for the moment –Production team managed to run part of validation sample over the holiday but success rate was low (30%). Main problems in (2) production –ESDs were too large because, among smaller problems too many (trigger) track collections were written out. This was not intentional and caused ESD size/evt to inflate to 2.6 Mb/evt on average. Causes reco production jobs (at 1000 events each) to crash because of ESD output files exceed 2GB limit. Problem sidestepped by lowering event count from 1000 to 250. –Also large number of (masked=ignored) ERROR messages in jobs (mostly from muon) –But despite low rate, managed to have a first look at pilot sample for validation

Wouter Verkerke, NIKHEF Physics Coordination 17/01/07 – Status of software Validation of (2) sample revealed a couple of problems –Most notably it turned out that fix for MuonBoy applied in code patch had not worked  Issue traced back to use of static libraries.  Thus it was decided to abandon (3) in favor of a cold build of a release –CBNT(AA) not produced in (2). This was traced back to issue in job transformation scripts and was fixed for next release –Note that old-style CBNT has been abandoned in favor of new Athena-Aware CBNT starting with –Also quite a few issues with trigger code were identified (collections not filled, clashes between trigger slices when run all together) –Otherwise most things looked OK in validation (e.g. Jet/ETmiss, EM clusters, Muon efficiencies, various trigger slices with post fixes)

Wouter Verkerke, NIKHEF Physics Coordination 17/01/07 – Software status Now building release –Build in phases, started yesterday (17/1/07) will complete by end of week –Again quite a bit of new code (>50 collected tags, all corresponding to bug reports, with stringent acceptance policy) –As soon as build is complete, will deploy on grid and resubmit validation sample and will restart validation cycle. –Note that reco job size has been permanently lowered from 1000 events to 500 events due to increased ESD size –Best case scenario (i.e. zero problems in (1)): Ready for bulk submission around Feb 1 st. Latest news: –Build of (1) kit delayed until today due to collection of tags to fix essential problems in trigger and last nights power outage –Updated best case scenario: bulk submission around Feb 6 st.

Wouter Verkerke, NIKHEF Computing context – Exercising distributed computing We mostly analyze data now by retrieving it from either castor, or through DQ2 and running it locally –Copy AOD dataset with dq2_get to local disk –Make ntuple of AOD dataset on local machine –Analyze ntuples on local machine This model does not scale to very large number of events (e.g. 1Mevt of events to be analyzed) ATLAS Computing does not support this mode of analysis for large number of events –Need to exercise distributed computing model –Make ntuples of AOD dataset on GRID –Analyze ntuples on local machine

Wouter Verkerke, NIKHEF R.Jones PC Dec 2006 – Facilities at CERN Tier-0: –Prompt first pass processing on express/calibration & physics streams with old calibrations - calibration, monitoring –Calibrations tasks on prompt data –24-48 hours later, process full physics data streams with reasonable calibrations  Implies large data movement from T0 → T1s CERN Analysis Facility –Access to ESD and RAW/calibration data on demand –Essential for early calibration –Detector optimisation/algorithmic development

Wouter Verkerke, NIKHEF R.Jones PC Dec 2006 – Facilities Away from CERN Tier-1: –Reprocess 1-2 months after arrival with better calibrations –Reprocess all resident RAW at year end with improved calibration and software  Implies large data movement from T1 ↔ T1 and T1 → T2 ~30 Tier 2 Centers distributed worldwide Monte Carlo Simulation, producing ESD, AOD, ESD, AOD  Tier 1 centers –On demand user physics analysis of shared datasets –Limited access to ESD and RAW data sets –Simulation  Implies ESD, AOD, ESD, AOD  Tier 1 centers Tier 3 Centers distributed worldwide –Physics analysis –Data private and local - summary datasets

Wouter Verkerke, NIKHEF R.Jones PC Dec 2006 – Headline Comments (2) The Tier 1s and Tier 2s are collective - if the data is on disk, you (T2) or your group (T1) can run on it For any substantial data access, jobs go to the data –Users currently think data goes to the job! Cannot be sustained When it is found to be needed later, requests above ~10Gbytes need to be requested, not grabbed –ESD should be can be delivered in a few hours –RAW on tape may take ~week Data for Tier 3s should be pulled from Tier 2s using ATLAS tools –Tier 3s need to ensure adequate networking

Wouter Verkerke, NIKHEF R.Jones PC Dec 2006 – Analysis computing model Analysis model broken into two components Tier 1: Scheduled central production of augmented AOD, tuples & TAG collections from ESD  Derived files moved to other T1s and to T2s Tier 2: On-demand user analysis of augmented AOD streams, tuples, new selections etc and individual user simulation and CPU-bound tasks matching the official MC production  Modest job traffic between T2s  Tier 2 files are not private, but may be for small sub-groups in physics/detector groups  Limited individual space, copy to Tier3s

Wouter Verkerke, NIKHEF R.Jones PC Dec 2006 – Group Analysis Group analysis will produce –Deep copies of subsets –Dataset definitions –TAG selections Characterised by access to full ESD and perhaps RAW –This is resource intensive –Must be a scheduled activity –Can back-navigate from AOD to ESD at same site Can harvest small samples of ESD (and some RAW) to be sent to Tier 2s Must be agreed by physics and detector groups Big Trains etc –Efficiency and scheduling gains access if analyses are blocked into a ‘big train’; some form of co-ordination is needed –Idea around for a while, already used in e.g. heavy ions Each wagon (group) has a wagon master )production manager Must ensure will not derail the train –Train must run often enough (every ~2 weeks?) –Trains can also harvest ESD and RAW samples for Tier 2s (but we should try to anticipate and place these subsets

Wouter Verkerke, NIKHEF R.Jones PC Dec T1 Data Tier 1 cloud (10 sites of very different size) contains: 10% of RAW on disk, the rest on tape –2 full copies of current ESD on disk an 1 copy of previous –A full AOD/TAG at each Tier 1 –A full set of group DPD Access is scheduled, through groups

Wouter Verkerke, NIKHEF R.Jones PC Dec On-demand Analysis Restricted Tier 2s and CAF –Can specialise some Tier 2s for some groups –ALL Tier 2s are for ATLAS-wide usage Most ATLAS Tier 2 data should be ‘placed’ with lifetime ~ months –Job must go to the data –This means Tier 2 bandwidth is vastly lower than if you pull data to the job –Back navigation requires AOD/ESD to be co-located Role and group based quotas are essential –Quotas to be determined per group not per user Data Selection –Over small samples with Tier-2 file-based TAG and AMI dataset selector –TAG queries over larger samples by batch job to database TAG at Tier-1s/large Tier 2s What data? –Group-derived formats –Subsets of ESD and RAW Pre-selected or selected via a Big Train run by working group No back-navigation between sites, formats should be co-located

Wouter Verkerke, NIKHEF R.Jones PC Dec 2006 – T2 Disk Tier 2 cloud (~30 sites of very, very different size) contains: There is some ESD and RAW –In 2007: 10% of RAW and 30% of ESD in Tier 2 cloud –In 2008: 30% of RAW and 150% of ESD in Tier 2 cloud –In 2009 and after: 10% of RAW and 30% of ESD in Tier 2 cloud Additional access to ESD and RAW in CAF –1/18 RAW and 10% ESD 10 copies of full AOD on disk A full set of official group DPD Lots of small group DPD and user data Access is ‘on demand’

Wouter Verkerke, NIKHEF R.Jones PC Dec 2006 – Optimised Access RAW, ESD and AOD will be streamed to optimise access The selection and direct access to individual events is via a TAG database –TAG is a keyed list of variables/event –Overhead of file opens is acceptable in many scenarios –Works very well with pre-streamed data Two roles –Direct access to event in file via pointer –Data collection definition function Two formats, file and database –Now believe large queries require full database Multi-TB relational database Restricts it to Tier1s and large Tier2s/CAF –File-based TAG allows direct access to events in files (pointers) Ordinary Tier2s hold file-based primary TAG corresponding to locally-held datasets

Wouter Verkerke, NIKHEF R.Jones PC Dec 2006 – Closing Comment We desperately need to exercise the analysis model –With real distributed analysis –With streamed (‘impure’) datasets –With the data realistically placed –(several copies available, not being pulled locally)

Wouter Verkerke, NIKHEF Exercising Computing Model at NIKHEF Two major aspects we need to learn –Sending jobs to the data –Working with streamed data Sending jobs to data the official way: ganga –Ganga is job submission frontend to grid –Replace low-level tools like edg-job-submit –Ganga knows where the data is  Jobs are automatically sent to data AOD replication tests to all T1 sites are about to start –Need to learn ganga soon –Tutorials exists on wiki – Propose to have a collective session at NIKHEF soon

Wouter Verkerke, NIKHEF Ganga basics Ganga is an easy-to-use frontend for job definition and management –Allows simple switching between testing on a local batch system and large- scale data processing on distributed resources (Grid) –Developed in the context of ATLAS and LHCb For ATLAS, have built-in support for applications based on Athena framework, for JobTransforms, and for DQ2 data-management system –Component architecture readily allows extension –Implemented in Python Strong development team, meaning strong user support –F.Brochu (Cambridge), U.Egede (Imperial), J.Elmsheuser (München), K.Harrison (Cambridge), H.C.Lee (ASCC), D.Liko (CERN), A.Maier (CERN), J.T.Moscicki (CERN), A.Muraru (Bucharest), V.Romanovsky (IHEP), A.Soroko (Oxford), C.L.Tan (Birmingham) Contributions past and present from many others

Wouter Verkerke, NIKHEF Ganga job abstraction A job in Ganga is constructed from a set of building blocks, not all required for every job Merger Application Backend Input Dataset Output Dataset Splitter Data read by application Data written by application Rule for dividing into subjobs Rule for combining outputs Where to run What to run Job

Wouter Verkerke, NIKHEF Applications and Backends Running of a particular Application on a given Backend is enabled by implementing an appropriate adapter component or Runtime Handler –Can often use same Runtime Handler for several Backend: less coding PBSOSGNorduGridLocalLSFPANDA US-ATLAS WMS LHCb WMS Executable Athena (Simulation/Digitisation/ Reconstruction/Analysis) AthenaMC (Production) Gauss/Boole/Brunel/DaVinci (Simulation/Digitisation/ Reconstruction/Analysis) LHCbExperiment neutralATLAS Implemented Work in progress

Wouter Verkerke, NIKHEF LHCb applications ATLAS applications Other applications Applications Experiment-specific workload-management systems Local batch systemsDistributed (Grid) systems Processing systems (backends) Metadata catalogues Data storage and retrieval File catalogues Tools for data management Local repository Remote repository Ganga job archives Ganga monitoring loop User interface for job definition and management Ganga has built-in support for ATLAS and LHCb Component architecture allows customisation for other user groups Ganga: how the pieces fit together

Wouter Verkerke, NIKHEF Ganga & Athena Ganga has python frontend to manage job submission, organization stuff etc… j1 = Job( backend =LSF() ) # Create a new job for LSF a1 = Executable() # Create Executable application j1.application = a1 # Set value for job’s application j1.backend = LCG() # Change job’s backend to LCG export( j1, “myJob.py” ) # Write job to specified file load( “myJob.py” ) # Load job(s) from specified file j2 = j1.copy() # Create j2 as a copy of job j1 jobs # List jobs j.submit() # Submit the job j.kill() # Kill the job (if running) j.remove() # Kill the job and delete associated files !ls j.outputdir # List files in job’s output directory t = JobTemplate() # Create template templates # List templates j3 = Job( templates[ i ] ) # Create job from template

Wouter Verkerke, NIKHEF Ganga & Athena Also accessible from the command line Once it is all setup, submitting a standard athena job on the grid is supposed to be this simple Most recent web tutorial ganga athena --inDS csc PythiaH130zz4l.recon.AOD.v outputdata AnalysisSkeleton.aan.root --split 2 --lcg../share/AnalysisSkeleton_jobOptions.py