CMS Data Challenge Experience on LCG-2

CMS Data Challenge Experience on LCG-2
Claudio Grandi INFN - Bologna INFNGrid Workshop - Bari October 26th 2004

INFNGrid Workshop - Bari
Outline Definition of Data Challenge 2004 (DC04) Pre-Challenge Production (PCP) DC04 experience What’s next Conclusions INFNGrid Workshop - Bari October 26th 2004

CMS Data Challenge 2004 Aim of DC04: Reconstruct data at a sustained rate of 25Hz in the Tier-0 farm 25% of the target conditions for LHC startup register data and metadata to a catalogue transfer the reconstructed data to all Tier-1 centers analyze the reconstructed data at the Tier-1’s as they arrive publicize to the community the data produced at Tier-1’s monitor and archive of performance criteria of the ensemble of activities for debugging and post-mortem analysis Not a CPU challenge, but a full chain demonstration! Pre-challenge production in 2003/04 90M Monte Carlo events (40M Geant4) simulated and 85M events digitized to date 15M digitized in time for DC04, another 10M during DC04 Classic and grid (CMS/LCG-0, LCG-1, Grid3) productions Now continuous production INFNGrid Workshop - Bari October 26th 2004

Pre-Challenge Production layout
Dataset metadata Phys.Group asks for a new dataset JDL Grid (LCG) Scheduler LCG-0/1 RLS Job metadata DAG job DAGMan (MOP) Chimera VDL Virtual Data Catalogue Planner Grid3 Production Manager defines assignments RefDB Computer farm shell scripts Data-level query Local Batch Manager Job level query BOSS DB McRunjob + plug-in CMSProd Site Manager starts an assignment Push data or info Pull info INFNGrid Workshop - Bari October 26th 2004

PCP statistics 850K jobs 4500 KSI2000 months 850K files 100 TB of data DC04 Simulation DC04 Digitization Now running continuous productions INFNGrid Workshop - Bari October 26th 2004

PCP on grid: CMS-LCG CMS-LCG Regional Center Mevts “heavy” pythia: ~2000 jobs ~10 KSI2000 months Mevts cmsim+oscar: ~8500 jobs ~130 KSI2000 months ~2 TB data DC04 Gen+Sim on LCG CMS/LCG-0 LCG-1 RefDB Eff:70%-90% Eff: ~60% Dataset metadata RLS CMS/LCG-0 Joint project CMS-LCG-EDT Based on LCG pilot distribution Including GLUE, VOMS, GridICE, RLS About 170 CPU’s and 4 TB disk Sites: Bari Bologna Bristol Brunel CERN CNAF Ecole Polytechnique Imperial College ISLAMABAD-NCP Legnaro Milano NCU-Taiwan Padova U.Iowa UI CE SE McRunjob + ImpalaLite JDL RB CE CE SE SE bdII WN BOSS CE Job metadata SE Push data or info Pull info INFNGrid Workshop - Bari October 26th 2004

PCP (and beyond) on Grid: Grid3
USMOP Regional Center Mevts pythia: ~30000 jobs ~0.7 KSI2000 monhs - 19 Mevts cmsim+oscar: ~19000 jobs ~1000 KSI2000 months ~13 TB data ~ 70% overall job efficiency Mainly non-grid related DC04 Simulation on Grid3 MOP System Master Site Remote Site 1 MCRunJob mop_submitter DAGMan Condor-G GridFTP Batch Queue Remote Site N Grid3 US grid projects + US LHC expt.’s Over 2000 CPU’s in 25 sites MOP Dagman and Condor-G for specification and submission Condor-based match-making process selects resources INFNGrid Workshop - Bari October 26th 2004

Data Challenge 2004 layout Tier-2 Physicist T2 storage ORCA Local Job Tier-2 Physicist T2 storage ORCA Local Job Tier-0 Castor IB fake on-line process RefDB POOL RLS catalogue TMDB ORCA RECO Job GDB data distribution agents EB Tier-2 Physicist T2 storage ORCA Local Job LCG-2 Services Tier-1 agent T1 storage ORCA Analysis Job MSS Grid Job Tier-1 agent T1 storage ORCA Analysis Job MSS Grid Job Tier-1 agent T1 storage ORCA Analysis Job MSS Grid Job INFNGrid Workshop - Bari October 26th 2004

DC04 setup CERN RB bdII RLS RefDB MonaLisa Tier-0 LCG tools CMS tools SE Castor VOMS SE TMDB SRM SRB other tools LCG-2 RM chain SRB chain SRM chain Tier-1’s CNAF (Italy) GridICE RB PIC (Spain) RB FNAL (USA) RAL (UK) MCAT GridKA (Germany) IN2P3 (France) SE Castor SE Castor SRM Enstore SRB Castor SRB Tivoli SRB HPSS UI CE SE UI CE SE Local farm Local farm Local farm Local farm Tier-2’s Legnaro CIEMAT UFL Caltech CE SE UI CE SE SRM Local farm SRM Local farm INFNGrid Workshop - Bari October 26th 2004

DC04 Processing Rate Processed more than 25M events Generally kept up with data transfers (CNAF, PIC, FNAL) Got above 25Hz on many short occasions But only one full day above 25Hz with full system Overloaded RLS, … INFNGrid Workshop - Bari October 26th 2004

DC04: RLS RLS used as a global POOL catalogue, with full file meta data LRC: Global file catalogue (GUID  PFNs) 570K LFNs registered in RLS, each with  5-10 PFN’s RMC: Global metadata catalogue (GUID  metadata) 9 metadata attributes per file (up to ~1 KB metadata per file) Performances: LRC fast enough C++ API programs ( sec/file) POOL CLI with GUID (secs/file) Need optimization for bulk queries RMC too slow secs/filewhen everything goes well hours to get info for a collection 25 Hz Time to register 16 files (1 reco job output) INFNGrid Workshop - Bari October 26th 2004

DC04 LCG-2 data transfer chain (1/2)
Tier-0: Export Buffer: classic-SE 3 SE machines with 1 TB disk each Tier-1’s: Castor-SE receiving data but different underlying MSS hardware solution Performances: CNAF: Replica Manager CLI copy a file and inherently register it to the RLS, with file-size info stored in the LRC over-head introduced by CLI java processes safer against failed replicas PIC: globus-url-copy + LRC C++ API copy a file and later register to the RLS, no file-size check faster! no quality-check of replica operations TMDB RLS RM data distribution agent CERN Castor Disk SE EB Tier-1 Tier-1 agent Castor CASTOR SE Tier-2 Disk SE INFNGrid Workshop - Bari October 26th 2004

DC04 LCG-2 data transfer chain (2/2)
Both CNAF and PIC approaches achieved good performances T1 agents robust, kept the pace with data available at EB Dealing with too many small files (affected all distribution chains) “bad” for: → efficient use of bandwidth → scalability of MSS systems network ‘stress-test’ at the end of DC04 with ‘big’ files: typical transfer rates >30 MB/s, CNAF sustained >42 MB/s for some hours CERN Tier-0: SE-EB of the LCG chain CNAF T1 network monitoring eth I/O eth I/O ~340 Mbps CNAF Tier-1: Castor-SE >3k files >750 GB CNAF Tier-1: Classic disk-SE INFNGrid Workshop - Bari October 26th 2004

DC04 Real-time Data Analysis
Submit analysis jobs automatically as new data are available on Tier-1 and Tier-2 SE’s Job submission via the LCG-2 Resource Broker Job sent close to the data File access on SE via rfio Output data registered in RLS Job monitoring with BOSS Kept up with the rate of data coming from CERN > analysis jobs in ~2 weeks ~ 40 Hz maximum event frequency 90-95% grid efficiency ~20 minutes delay from data at Tier-0 to analysis at Tier-1 Real-time analysis was done only in LCG environment (Italy & Spain) INFNGrid Workshop - Bari October 26th 2004

What’s next: End-User Analysis
Now focus in providing an End-to-End analysis system PhySh (Physics Shell) is an “end-user” environment Based on Clarens Web services A “glue” interface among different services User’s interface modeled as a virtual file system PhySh Virtual File System Interface XML RPC SOAP or Python Software Env. SCRAM(local) Job submission (local) Clarens Web Services Other XML RPC or SOAP client Shell client Web client Pyhton client Dataset Catalogue (PubDB/RefDB) Data transfer service (PheDex) Software Environment (SCRAM) Job Submission INFNGrid Workshop - Bari October 26th 2004

What’s next: Data Access
Data Location service CMS specific DataSet catalogue: RefDB + PubDB Allow Dataset discovery and location of Dataset catalogues Data transfer service PheDEX (Physics Experiment Data Export) Evolution of TMDB system used in DC04 Allows request management and file transfer based on data subscriptions Dataset discovery RefDB Publication discovery (PubDB-URLs) PubDB (INFN) PubDB (CERN) Local catalogues Local catalogues INFNGrid Workshop - Bari October 26th 2004

What’s next: analysis job submission
Data Location Interface File-based data location UI PhySh Resource Broker (RB) node RLS Logical File Name location (URL) Workload Management System dataset-based data location The end-user inputs: Datasets (runs,#event,…) + private code Logical Dataset DataSet Catalogue (PubDB/RefDB) PhySh location (URL) Several tools for job preparation, splitting, submission are under development integrated with: LCG (GROSS,GRAPE) Grid3 (Runjob) EGEE (gLite prototype) Monitoring services and jobs MonaLisa, GridICE, BOSS Storage Element Inform. Service Computing Element INFNGrid Workshop - Bari October 26th 2004

Conclusions CMS distributed production based on grid middleware used in the official CMS production system Grid3: reliable and scalable system for massive production LCG: large scale productions proved distributed sites consistent configuration and control is very important LCG-2 used in CMS Data Challenge LCG environment provides the functionalities for distributed computing Full chain (but the Tier-0 reconstruction) done on the LCG-2 system The catalogues are an issue! Grid point-to-point file transfer tools Infrastructure for data analysis LCG data distribution and data analysis chain successfully met the data challenge goals of large scale scheduled distribution to a set of Tier-1/2 and subsequent analysis New end-to end analysis system in development Components by several grid projects LCG, EGEE (GLite), GRID-3 … PhySh – virtual file system is the “glue” different services A consistent interface to the physicist A flexible application framework A set of back-end services INFNGrid Workshop - Bari October 26th 2004

Backup slides INFNGrid Workshop - Bari October 26th 2004

SRM Interaction Diagram
INFNGrid Workshop - Bari October 26th 2004

DC04 SRM data transfer chain (1/2)
Tier-0: SRM/dCache based DRM serving as an EB files are staged out of Castor to the dCache pool disk and pinned until transferred Tier-1: SRM/dCache/Enstore based HRM acting as Import Buffer with SRM interface providing access to Enstore via dCache SRM transactions to receive TURLs from EB, transfers via gridftp SRM client (T1 agent machine) SRM-COPY CERN T0 FNAL T1 SRM-GET (one file at a time, return TURL) SRM (performs stage, pin file) SRM (performs space reservation, write) GridFTP GET (pull mode) Network transfer Enstore EB dCache Pool dCache Pool INFNGrid Workshop - Bari October 26th 2004

DC04 SRM data transfer chain (2/2)
In general quite robust tools e.g. SRM in error checking/retrying, dCache for authomatic migration to tape, .. Stressed a few software/hardware components to the breaking point e.g. monitoring not implemented to catch service failures, forcing manual interventions Again: problems of high-nb/small-size of DC files use of multiple streams with multiple files in each stream reduced the overhead of the authentication process MSS optimization necessary to handle the challenge load Inefficient use of tapes forced more tapes allocations + deployment of larger namespace service relevant improvements during ongoing DC operations e.g. reduction on delegated proxy’s modulus size in SRM yielding a speed-up of the interaction between SRM client and server of a factor 3.5 INFNGrid Workshop - Bari October 26th 2004

PhySh Data are organized as files in a directory structure Actions are expressed in terms of copying, linking, creating files Multiple clients satisfy different needs INFNGrid Workshop - Bari October 26th 2004

PhySh cp /Applications/ORCA/ORCA_8_4_0 /Tiers/FNAL/ cp /RefDB/eg03_foo/DST /Tiers/PIC Installs ORCA_8_4_0 at FNAL Copies DST for dataset eg03_bar to PIC INFNGrid Workshop - Bari October 26th 2004

CMS Data Challenge Experience on LCG-2

Similar presentations

Presentation on theme: "CMS Data Challenge Experience on LCG-2"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CMS Data Challenge Experience on LCG-2

Similar presentations

Presentation on theme: "CMS Data Challenge Experience on LCG-2"— Presentation transcript:

Similar presentations

About project

Feedback