PSM, 16.06.03 Database requirements for POOL (File catalog performance requirements) Maria Girone, IT-DB Strongly based on input from experiments: subject.

Slides:



Advertisements
Similar presentations
RLS Production Services Maria Girone PPARC-LCG, CERN LCG-POOL and IT-DB Physics Services 10 th GridPP Meeting, CERN, 3 rd June What is the RLS -
Advertisements

D. Düllmann - IT/DB LCG - POOL Project1 POOL Release Plan for 2003 Dirk Düllmann LCG Application Area Meeting, 5 th March 2003.
ATLAS computing in Geneva Szymon Gadomski, NDGF meeting, September 2009 S. Gadomski, ”ATLAS computing in Geneva", NDGF, Sept 091 the Geneva ATLAS Tier-3.
Current Monte Carlo calculation activities in ATLAS (ATLAS Data Challenges) Oxana Smirnova LCG/ATLAS, Lund University SWEGRID Seminar (April 9, 2003, Uppsala)
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
D. Duellmann, CERN Data Management at the LHC1 Data Management at CERN’s Large Hadron Collider (LHC) Dirk Düllmann CERN IT/DB, Switzerland
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
WLCG/8 July 2010/MCSawley WAN area transfers and networking: a predictive model for CMS WLCG Workshop, July 7-9, 2010 Marie-Christine Sawley, ETH Zurich.
CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University22/9/2003 CMS Applications Progress towards GridPP milestones Data management.
LCG 3D StatusDirk Duellmann1 LCG 3D Throughput Tests Scheduled for May - extended until end of June –Use the production database clusters at tier 1 and.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
Experience with the WLCG Computing Grid 10 June 2010 Ian Fisk.
Claudio Grandi INFN Bologna CMS Operations Update Ian Fisk, Claudio Grandi 1.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 Plans for the integration of grid tools in the CMS computing environment Claudio.
May PEM status report. O.Bärring 1 PEM status report Large-Scale Cluster Computing Workshop FNAL, May Olof Bärring, CERN.
LHC Computing Review Recommendations John Harvey CERN/EP March 28 th, th LHCb Software Week.
CDF Offline Production Farms Stephen Wolbers for the CDF Production Farms Group May 30, 2001.
LCG Application Area Internal Review Persistency Framework - Project Overview Dirk Duellmann, CERN IT and
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
Quick Introduction to NorduGrid Oxana Smirnova 4 th Nordic LHC Workshop November 23, 2001, Stockholm.
David Adams ATLAS DIAL status David Adams BNL November 21, 2002 ATLAS software meeting GRID session.
1 PRAGUE site report. 2 Overview Supported HEP experiments and staff Hardware on Prague farms Statistics about running LHC experiment’s DC Experience.
1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.
LCG Service Challenges: Planning for Tier2 Sites Update for HEPiX meeting Jamie Shiers IT-GD, CERN.
LFC Replication Tests LCG 3D Workshop Barbara Martelli.
1 Database mini workshop: reconstressing athena RECONSTRESSing: stress testing COOL reading of athena reconstruction clients Database mini workshop, CERN.
CMS Computing and Core-Software USCMS CB Riverside, May 19, 2001 David Stickland, Princeton University CMS Computing and Core-Software Deputy PM.
The POOL Persistency Framework POOL Project Review Introduction & Overview Dirk Düllmann, IT-DB & LCG-POOL LCG Application Area Internal Review October.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
Report on CHEP ‘06 David Lawrence. Conference had many participants, but was clearly dominated by LHC LHC has 4 major experiments: ALICE, ATLAS, CMS,
Monte Carlo Data Production and Analysis at Bologna LHCb Bologna.
2003 Conference for Computing in High Energy and Nuclear Physics La Jolla, California Giovanna Lehmann - CERN EP/ATD The DataFlow of the ATLAS Trigger.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
D. Duellmann - IT/DB LCG - POOL Project1 The LCG Pool Project and ROOT I/O Dirk Duellmann What is Pool? Component Breakdown Status and Plans.
Tier-2 storage A hardware view. HEP Storage dCache –needs feed and care although setup is now easier. DPM –easier to deploy xrootd (as system) is also.
13 October 2004GDB - NIKHEF M. Lokajicek1 Operational Issues in Prague Data Challenge Experience.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
LHCb report to LHCC and C-RSG Philippe Charpentier CERN on behalf of LHCb.
David Stickland CMS Core Software and Computing
Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN
1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
01. December 2004Bernd Panzer-Steindel, CERN/IT1 Tape Storage Issues Bernd Panzer-Steindel LCG Fabric Area Manager CERN/IT.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, R. Brock,T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina,
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
ALICE RRB-T ALICE Computing – an update F.Carminati 23 October 2001.
Computer Performance. Hard Drive - HDD Stores your files, programs, and information. If it gets full, you can’t save any more. Measured in bytes (KB,
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
1 CMS Virtual Data Overview Koen Holtman Caltech/CMS GriPhyN all-hands meeting, Marina del Rey April 9, 2001.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
1 June 11/Ian Fisk CMS Model and the Network Ian Fisk.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
1-2 March 2006 P. Capiluppi INFN Tier1 for the LHC Experiments: ALICE, ATLAS, CMS, LHCb.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
ALICE Computing Data Challenge VI
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
(on behalf of the POOL team)
IT-DB Physics Services Planning for LHC start-up
Scalability to Hundreds of Clients in HEP Object Databases
POOL: Component Overview and use of the File Catalog
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
Grid Datafarm and File System Services
ATLAS DC2 & Continuous production
Development of LHCb Computing Model F Harris
Presentation transcript:

PSM, Database requirements for POOL (File catalog performance requirements) Maria Girone, IT-DB Strongly based on input from experiments: subject to modifications! Disclaimer: To be considered as VERY PRELIMINARY!

PSM, Maria Girone, IT-DB 2 Introduction POOL project: common persistency framework for physics applications at LHC Provides persistency for c++ transient objects and transparent navigation to single objects integrated with a Grid-aware File Catalog POOL has chosen the Local Replica Catalog (LRC) and Replica Metadata Catalog (RMC) as the basis of the Grid catalog implementation Pre-production service based on Oracle (from IT/DB), RLSTEST, already in use for POOL V1.0 (May 13 th ) POOL will provide a production release in June to be used for LCG-1

PSM, Maria Girone, IT-DB 3 Inputs and assumptions Calculate the registration and lookup frequency for the EDG-RLS service for –LCG-1 - based on CMS PCP and DC04 documents –2008 Assumptions: –LCG-1 - CPU power: PIII, 1 GHz= 400 SI2k, –2008 – CMS CPU power: T1 ~2M SI2k, x5 T1 ~ 10M SI2k, All T2 = All T1 World Wide 20M SI2k [4]

PSM, Maria Girone, IT-DB 4 Processing time LCG-1 –Simul: 1 job reads 1 kine file (100 events) and writes 1 output file. Takes 12 hours (400s/event, or 160k SI2k s/event) –Digi: 1 job reads 10 simul files (1000 events) and writes 1 output file. Takes 6 hours (20s/event, or 8k SI2k s/event) –Reco: 1 job reads 1 digi file (1000 events) and writes 1 output file. Takes 9 hours (30s/event, or 12k SI2k s/event) 2008 –Analysis: 100 SI2k s/event[5]

PSM, Maria Girone, IT-DB 5 Requirements for LCG-1 Take CMS PCP as example: Total number of events to produce (kine, Simul, Digi and half Reco): 50M [1] July November –Fraction of expected LCG-1 production [2] 10% 75% –Number of input files 50k 375k –Number of output files 50k 450k –Input file size 20MB * GB –Output file size 200MB GB –Total file size 10TB 140TB

PSM, Maria Girone, IT-DB 6 Database requirements for LCG-1 Assuming 25% loss July November Average number of jobs/day[2] in LCG If all would be in LCG-1: –File lookup frequency: 0.05Hz 0.2Hz –File registration frequency: 0.05Hz 0.1Hz –Total interaction rate: 0.1Hz 0.3Hz 1 interaction every 10 sec 3 sec Current tests show performance well in excess of these requirements!

PSM, Maria Girone, IT-DB 7 Predictions for 2008 In CMS one event is 100 kB and is analyzed in ~100 SI2k s [5] The rate capability world wide is 20M SI2k / 100 SI2k s = 200k events/s Data rate capability world wide is 20 GB/s The size of useful data in 1 file is guessed to be File size * fraction of data used = 2 GB * 0.1 = 0.2 GB Max number of file opening in CMS world wide = 20 GB/s / 0.2 GB = 100 Hz If at CERN 50 GB/s [6] lookup frequency 250 Hz decreases by increasing the useful data size in 1 file.

PSM, Maria Girone, IT-DB 8 Predictions for 2008 Considering analysis, simulation, reprocessing and reconstruction[8] at CERN Alice needs 7.4M SI2k Atlas needs 6.3M SI2k CMS needs 7.4M SI2k LHCb needs 2.0M SI2k M SI2k Assuming that all tasks require on average a single process time as for CMS analysis (0.1k SI2k s/event) and an event size of 100kB throughput of 23 GB/s is needed Assuming a useful file size of 2GB*10%=0.2GB lookup frequency = 120 Hz at CERN 1 look up every 8 ms

PSM, Maria Girone, IT-DB 9 Conclusions The estimated lookup and registration frequencies for LCG-1 is based on the CMS PCP and DC04 figures. The unknown is the fraction of data production via LCG-1 – If all goes via LCG-1, the average rate of registrations and lookups is 300mHz (1 every 3 s) The projection of these numbers to 2008 is based on the following assumptions –Available CPU power to the experiments: 23M SI2k [8] –Average process time: 0.1k SI2k s/event[5] (230k evts/s) –Average event size in analysis: 0.1 MB [5] (throughput 23GB/s) –Useful file size: 2GB*10%=200MB From above, the maximum peak lookup frequency (throughput/useful file size) ~120 Hz

PSM, Maria Girone, IT-DB 10 Consistency check In dual cpu box will correspond to an average 5k SI2k [7] 1 CPU will be able to process 2.5 MB/s To reach 25 GB/s 10k CPUs or 5k Dual CPU are needed and will be available [9]

PSM, Maria Girone, IT-DB 11 References CMS Pre-DC04 –[1] DC04.xml (Claudio Grandi) –[2] Performance Metrics (Tony Wildish) –[3] Derived from [1-2] considering the average data read and written / job and the number of job/day CMS Analysis –[4] Table A3.6 of the CERN/LHCC/ CERN/RRB-D SI95=9SI200 –[5] P. Capiluppi "World Wide Computing" LHCC Comprehensive Review of CMS SW and Computing –[6] Bernd Panzer-Steindel “LAN and disk I/O predictions” Draft 1.0 –[7] Table A3.13 of the CERN/LHCC/ CERN/RRB-D SI95=9SI200 –[8] Table A3.9 of the CERN/LHCC/ CERN/RRB-D SI95=9SI200 –[9] End of Section of the CERN/LHCC/ CERN/RRB- D