Analysis vs Storage 4 the LHC Experiments (… ATLAS biased view ) Alessandro Di Girolamo CERN / INFN.

Slides:



Advertisements
Similar presentations
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
Advertisements

Israel Cluster Structure. Outline The local cluster Local analysis on the cluster –Program location –Storage –Interactive analysis & batch analysis –PBS.
Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Wahid Bhimji University of Edinburgh J. Cranshaw, P. van Gemmeren, D. Malon, R. D. Schaffer, and I. Vukotic On behalf of the ATLAS collaboration CHEP 2012.
Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,
ATLAS Distributed Analysis Experiences in STEP'09 Dan van der Ster for the DA stress testing team and ATLAS Distributed Computing WLCG STEP'09 Post-Mortem.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
Experiment Support ANALYSIS FUNCTIONAL AND STRESS TESTING Dan van der Ster, CERN IT-ES-DAS for the HC team: Johannes Elmsheuser, Federica Legger, Mario.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
ATLAS Distributed Analysis Dietrich Liko. Thanks to … pathena/PANDA: T. Maneo, T. Wenaus, K. De DQ2 end user tools: T. Maneo GANGA Core: U. Edege, J.
A PanDA Backend for the Ganga Analysis Interface J. Elmsheuser 1, D. Liko 2, T. Maeno 3, P. Nilsson 4, D.C. Vanderster 5, T. Wenaus 3, R. Walker 1 1: Ludwig-Maximilians-Universität.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
CMS Computing Model Simulation Stephen Gowdy/FNAL 30th April 2015CMS Computing Model Simulation1.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
HammerCloud Functional tests Valentina Mancinelli IT/SDC 28/2/2014.
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
ANALYSIS TOOLS FOR THE LHC EXPERIMENTS Dietrich Liko / CERN IT.
DIRAC Pilot Jobs A. Casajus, R. Graciani, A. Tsaregorodtsev for the LHCb DIRAC team Pilot Framework and the DIRAC WMS DIRAC Workload Management System.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
Dan van der Ster: Functional and Large-Scale Testing for ATLAS DA CHEP 2009 – Prague – March Functional and Large-Scale Testing of the ATLAS.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
UKI-LT2-RHUL ATLAS STEP09 report Duncan Rand on behalf of the RHUL Grid team.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
The ATLAS Strategy for Distributed Analysis on several Grid Infrastructures D. Liko, IT/PSS for the ATLAS Distributed Analysis Community.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko
WMS baseline issues in Atlas Miguel Branco Alessandro De Salvo Outline  The Atlas Production System  WMS baseline issues in Atlas.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.
ATLAS TIER3 in Valencia Santiago González de la Hoz IFIC – Instituto de Física Corpuscular (Valencia)
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
STORAGE EXPERIENCES AT MWT2 (US ATLAS MIDWEST TIER2 CENTER) Aaron van Meerten University of Chicago Sarah Williams Indiana University OSG Storage Forum,
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
THE ATLAS COMPUTING MODEL Sahal Yacoob UKZN On behalf of the ATLAS collaboration.
Outline Benchmarking in ATLAS Performance scaling
Data Challenge with the Grid in ATLAS
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS: the CMS interface for job summission, monitoring and bookkeeping
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Readiness of ATLAS Computing - A personal view
BOSS: the CMS interface for job summission, monitoring and bookkeeping
The LHCb Computing Data Challenge DC06
Presentation transcript:

Analysis vs Storage 4 the LHC Experiments (… ATLAS biased view ) Alessandro Di Girolamo CERN / INFN

Outline Overview of LHCb limitations for data access by analysis jobs Overview of the data access for CMS analysis jobs (dcache vs lustre) ATLAS users analysis stress test 14 May Alessandro Di Girolamo (CERN/INFN)

LHCb: What and How Understanding current limitations for Data access by User Analysis Jobs Select a user with a working analysis code and need to run over a large data sample Job submission and analysis of output. Repeat the submission with different samples periodically (2 – 3 times a week). Each submission consist of: – ~ 600 jobs, reading 100 different files with 500 events per file. –Average file size 200 MB –1 Job = 100 Files = events ~ 20 GB ~ 1-2 hours~ 2-3 MB/s

LHCb: some results Total Number of Successful Jobs CPU/WallClock for Successful Jobs Daily Amount of Data read by SiteDaily Number of Jobs by Site

LHCb Tier1s performances CNAF: 0.11 s/evt s/file IN2P3: 0.18 s/evt s/file Distribution of Wall Clock time to process 100 consecutive events (black line). Distribution of Wall Clock time to process 100 consecutive events including a new file opening (red line). The difference of the means represents the file opening time.

CMS analysis job In order to find the limits, coming from the access to data, on the number of concurrent analysis jobs real CMS jobs were used in the test which just read the input file but without doing any computation In this condition the I/O required by a single job is about 10MB/s This looks like the upper bandwidth limit = unzipping root objects (using a Xeon 2.00GHz) In the plots the CPUTime and the overall time spent in I/O operation for a CMS analysis job are shown. As a function of the number of concurrent jobs. both for lustre and dCache

CMS analysis job execution time (min) concurrent jobs execution time (min)

What the ATLAS user want (would!) My Laptop ~= GRID (but GRID must be faster) “Classic” analyses with Athena & AthenaROOTAccess: –MonteCarlo processing, cosmics, reprocessed data –Various sizes of input data: AOD, DPD, ESD –TAG analyses for direct data access Calibrations & Alignment: RAW data and remote database access Small MC Sample Production: transformations ROOT: Generic ROOT application also with DQ2 access Generic Executables: for everything else 14 May 2009 Alessandro Di Girolamo (CERN/INFN) 8

9 MC DATA SCRATCH CPUs Analysis tools Detector data 70 TB RAW, ESD, AOD, DPD Centrally managed Simulated data 80 TB RAW, ESD, AOD, DPD Centrally managed Physics Group data 20 TB DnPD, ntup, hist,.. Group managed User Scratch data 20 TB User data Transient Local Storage Non pledged User data Locally managed ATLAS Jobs go to the Data Example for a 200 TB T2 Managed with space tokens Buffers, spare 10 TB 14 May 2009 Alessandro Di Girolamo (CERN/INFN)

ATLAS Distributed Analysis 14 May Alessandro Di Girolamo (CERN/INFN) pAthena WN Site B WN site A job submit WMS job pull push PANDA

11 PANDA site A Panda server site B pilot Worker Nodes condor-g Scheduler glite https submit pull run End-user run job pilot ProdSys job Scheduler sends pilots to the batch system and Grid – CondorG scheduler For most US ATLAS OSG sites – Local scheduler BNL(condor) and UTA(PBS) Very efficient and robust – Generic scheduler Supports also non-ATLAS OSG VOs and LCG Move pilot submission from a global submission point to a site-local pilot factory Scheduler sends pilots to the batch system and Grid – CondorG scheduler For most US ATLAS OSG sites – Local scheduler BNL(condor) and UTA(PBS) Very efficient and robust – Generic scheduler Supports also non-ATLAS OSG VOs and LCG Move pilot submission from a global submission point to a site-local pilot factory 14 May 2009 Alessandro Di Girolamo (CERN/INFN)

How pilots work Sends the several parameters to Panda server for job matching –CPU speed –Available memory size on the WN –List of available ATLAS releases at the site Runs the job immediately (all input files should be already available at the site) –Sends heartbeat every 30min Copy output files to local SE and register them to Catalog Analysis jobs run under production proxy unless gLExec is implemented in identity switching mode –gLExec based identity change on WN to submitter identity for user jobs under testing (proxy management done by MyProxy) –Security issues have been investigated and clarified for ATLAS gLExec is considered mature 14 May Alessandro Di Girolamo (CERN/INFN)

GANGA Can submit jobs both to WMS and PANDA 14 May 2009 Alessandro Di Girolamo (CERN/INFN) 13 Ganga is a Grid user interface for HEP experiments Key piece of the distributed-analysis systems for ATLAS and LHCb Manages large scale scientific applications on the Grid: configuring the applications switching between testing on a local batch system and large-scale processing on the Grid keeping track of results discovery of dataset locations by direct interfacing to metadata and file catalogues Portable Python code

Testing the DA infrastructure Functional test –GangaRobot Vital to verify sites configurations Stress test –HammerCloud “The” way to simulate chaotic users analysis –In fall 2008: interest in testing Tier 2s under load… »The first tests were in Italy, and were manual: 2-5 users submitting ~200 jobs each at the same time Results merged and analyzed hours later The IT tests saturated 1Gbps networks at T2s: <3Hz per job 14 May 2009 Alessandro Di Girolamo (CERN/INFN) 14 Creators and main developers: J. Elmsheuser & D. van der Ster

How HammerCloud works? An operator defines the tests: –What: a ganga job template, specifying input datasets and including an input sandbox tar.gz (athena analysis code) –Where: list of sites to test, number of jobs –When: start and end times –How: input data I/O (posix I/O==DQ2Local, FileStager) Each job runs athena over an entire input dataset. The test is defined with a dataset pattern (e.g. mc08.*.AOD.*), and HC generates one job per dataset. –Try to run with the same datasets at all sites: not always enough replicas ! HammerCloud runs the tests: 1.Generate appropriate jobs for each site 2.Submit the jobs (LCG and NG and Panda now) 3.Poll their statuses, writing incremental results in HC DB 4.Read HC DB to plot results on web. 5.Cleanup leftovers; kill jobs still incomplete When running many tests, each stage handles each test sequentially –This limits the number of tests that can run at once. Work in Progress 1514 May 2009 Alessandro Di Girolamo (CERN/INFN)

HammerCloud: the tests HammerCloud tests real analyses: –AOD analysis, based on Athena UserAnalysis pkg, analyzing mainly muons: Input data: muon AOD datasets, or other AODs if muons are not available –Reprocessed DPD analysis: Intended to test the remote conditions database (at local Tier 1) HammerCloud Metrics: –Exit status and log files –CPU/Wallclock ratio, events per second –Job timing: Queue, Input sandbox stage-in, Athena/CMT setup, LFC lookup, Athena exec, Output storage –Number of events and files processed (versus what was expected) –Some local statistics (e.g. network and storage rates) only available at site level monitoring Site contacts very important! 14 May Alessandro Di Girolamo (CERN/INFN)

17 HammerCloud: the tests (2) HammerCloud key variable (up until now): the data access –Posix I/O with local protocol: To tune rfio, dcap, gsidcap, storm, lustre, etc… Testing with read-ahead buffers on or off; large, small or tweaked. –Copy/stream the files locally But disk space is limited, and restarting athena causes overhead Athena FileStager plugin: –Uses a background thread to copy the input files from storage –Startup – Copy f1 – Process f1 & copy f2 – Process f2 & copy f3 – etc… 14 May 2009 Alessandro Di Girolamo (CERN/INFN)

14 May 2009 HammerCloud: some results… Example I/O rates from a classic Athena AOD analysis: –A fully loaded CPU can read events at ~20Hz (i.e. at this rate, the CPU, not the file I/O, is the bottleneck) –20Hz * 0.2MB per event = 4 MB/s per CPU –A site with 200 CPUs could consume data at 800 MB/s –This requires a 10Gbps network, and a storage system that can handle such a load. Alternatively, this means that 200 CPU cluster with a 1Gbps network will result in ~3Hz per analysis job  18 Alessandro Di Girolamo (CERN/INFN)

Example of HC results 14 May 2009 Alessandro Di Girolamo (CERN/INFN) 19

14 May 2009 Alessandro Di Girolamo (CERN/INFN) 20 Example of HC results

14 May 2009 Alessandro Di Girolamo (CERN/INFN) 21 Example of HC results

14 May 2009 Alessandro Di Girolamo (CERN/INFN) 22 Example of HC results

23 Overall HammerCloud Statistics Throughout the history of HammerCloud: –74 sites tested; nearly 200 tests; top sites tested >25 times –~50000 jobs total with average runtime of 2.2 hours. –Processed 2.7 billion events in 10.5 million files Success rate: –29 sites have >80% success rate; 9 sites >90% Across all tests: –CPU Utilisation: 27 sites >50% CPU; 8 sites >70% –Event rate: 19 sites > 10Hz; 7 sites >15Hz With FileStager data access mode: –CPU Utilisation: 36 sites >50%; 24 sites >70% –Event rate: 33 sites > 10Hz; 20 sites > 15Hz; 4 sites >20Hz Full statistics available at: NOTE: These are overall summaries without a quality cut; i.e. the numbers include old tests without tuned data access. 14 May 2009 Alessandro Di Girolamo (CERN/INFN)

24 Lesson learned so far… The expected benefits: –Most sites are not optimized to start out: HC can find the weaknesses. Sites rely on large quantities of jobs to tune their networks and storage –HammerCloud is a benchmark for the sites: Site admins can change their configuration, and then request a test to see how it affects performance –We are building a knowledge base of optimal data access modes at the sites: There is no magic solution w.r.t. Posix I/O vs. FileStager. Essential for the DA tools to employ this information about sites 14 May 2009 Alessandro Di Girolamo (CERN/INFN)

14 May 2009 … and also… Unexpected benefits: Unexpected storage bottlenecks (hot dataset problem): –Data not well distributed across all storage pools? one pool overloaded while the others sat idle. –Need to understand how to balance the pools Misunderstood behavior of distributed data management tools: –DB access jobs require a large sqlite database (dq2-get before start) dq2-get did not retrieve it from different storage areas of the site! –A large test could have brought systems down (but this was caught before the test thanks to a friendly user). Ganga’s download of the sqlite DB changed (as dq2-get’s behavior) Found athena I/O bug/misunderstanding: –HC found discrepancies in the number of files intended to be and actually processed. –Athena exitcode 0 if file open() times out: “success”  Behaviour was changed for Athena Alessandro Di Girolamo (CERN/INFN)

The Challenge o Support of Users activities 14 May Alessandro Di Girolamo (CERN/INFN) Difficult to simulate: Real life will provide new challenges and opportunities

Questions? 14 May Alessandro Di Girolamo (CERN/INFN) Thanks to all (…those who I took these slides from) !